Publications
The publications of the Information Processing and Analysis group are listed in BibSonomy and embedded here:
2024
Bridging the Analytics Gap: Optimizing Content Performance using Actionable Knowledge Discovery.
In: Proceedings of the 35th ACM Conference on Hypertext and Social Media, series HT '24, pages 185–192. Association for Computing Machinery, New York, NY, USA, 2024.
Tom Alby.
[doi] [abstract] [BibTeX]
In: Proceedings of the 35th ACM Conference on Hypertext and Social Media, series HT '24, pages 185–192. Association for Computing Machinery, New York, NY, USA, 2024.
Tom Alby.
[doi] [abstract] [BibTeX]
Web analytics tools like Google Analytics are widely available, but website owners outside the eCommerce sector struggle to extract actionable insights from their data to curate and optimize content. This difficulty often arises from challenges in identifying and aligning objectives with standard website performance metrics provided by these tools, compounded by a lack of expertise in tool configuration. This study focuses on automated approaches that generate actionable insights for owners of content-driven websites, analyzing visitor attention at the most granular level by focusing on segments of web pages. It considers both the length of the page and different device types used to access these pages. Existing research is augmented with four major contributions: First, a robust regression model to predict user behaviour based on the scroll behaviour of 850,000 visitors and more than 9 million data points from five diverse websites. Second, a dataset of measurements of web page lengths from a random sample of one million websites, for a better understanding of the relation between scroll behaviour and web page lengths. Third, an actionable knowledge discovery method for web analytics data of non-transactional websites that allows to identify deviations from expected visitor behaviour, enabling content optimization for those web analytics users who find it difficult to leverage their data today. Finally, an indicator for page performance that allows to compare page performance based on in-page visitor engagement. This research exemplifies the intersection of web analytics and intelligent content curation, showcasing a methodological framework that facilitates the generation of automated suggestions for digital content optimization, rooted in comprehensive behavioral data analysis.
Bridging the Analytics Gap: Optimizing Content Performance using Actionable Knowledge Discovery.
In: Proceedings of the 35th ACM Conference on Hypertext and Social Media, series HT '24, pages 185–192. Association for Computing Machinery, New York, NY, USA, 2024.
Tom Alby.
[doi] [abstract] [BibTeX]
In: Proceedings of the 35th ACM Conference on Hypertext and Social Media, series HT '24, pages 185–192. Association for Computing Machinery, New York, NY, USA, 2024.
Tom Alby.
[doi] [abstract] [BibTeX]
Web analytics tools like Google Analytics are widely available, but website owners outside the eCommerce sector struggle to extract actionable insights from their data to curate and optimize content. This difficulty often arises from challenges in identifying and aligning objectives with standard website performance metrics provided by these tools, compounded by a lack of expertise in tool configuration. This study focuses on automated approaches that generate actionable insights for owners of content-driven websites, analyzing visitor attention at the most granular level by focusing on segments of web pages. It considers both the length of the page and different device types used to access these pages. Existing research is augmented with four major contributions: First, a robust regression model to predict user behaviour based on the scroll behaviour of 850,000 visitors and more than 9 million data points from five diverse websites. Second, a dataset of measurements of web page lengths from a random sample of one million websites, for a better understanding of the relation between scroll behaviour and web page lengths. Third, an actionable knowledge discovery method for web analytics data of non-transactional websites that allows to identify deviations from expected visitor behaviour, enabling content optimization for those web analytics users who find it difficult to leverage their data today. Finally, an indicator for page performance that allows to compare page performance based on in-page visitor engagement. This research exemplifies the intersection of web analytics and intelligent content curation, showcasing a methodological framework that facilitates the generation of automated suggestions for digital content optimization, rooted in comprehensive behavioral data analysis.
A Repository for Formal Contexts.
In: I. P. Cabrera, S. Ferré and S. Obiedkov, editors, Conceptual Knowledge Structures, series Lecture Notes in Artificial Intelligence, pages 182-197. Springer Nature Switzerland, Cham, 2024.
Tom Hanika and Robert Jäschke.
[abstract] [BibTeX]
In: I. P. Cabrera, S. Ferré and S. Obiedkov, editors, Conceptual Knowledge Structures, series Lecture Notes in Artificial Intelligence, pages 182-197. Springer Nature Switzerland, Cham, 2024.
Tom Hanika and Robert Jäschke.
[abstract] [BibTeX]
Data is always at the center of the theoretical development and investigation of the applicability of formal concept analysis. It is therefore not surprising that a large number of data sets are repeatedly used in scholarly articles and software tools, acting as de facto standard data sets. However, the distribution of the data sets poses a problem for the sustainable development of the research field. There is a lack of a central location that provides and describes FCA data sets and links them to already known analysis results. This article analyses the current state of the dissemination of FCA data sets, presents the requirements for a central FCA repository, and highlights the challenges for this.
Literatur im Wikiversum – Eine praktische Annäherung über API-Abfragen und Wikipedia-Metriken.
In: Konferenzabstracts der DHd 2024, pages 49-53. 2024.
Viktor Illmer, Bart Soethaert, Lilly Welz, Frank Fischer and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: Konferenzabstracts der DHd 2024, pages 49-53. 2024.
Viktor Illmer, Bart Soethaert, Lilly Welz, Frank Fischer and Robert Jäschke.
[doi] [abstract] [BibTeX]
Die kollaborativ erstellte Online-Enzyklopädie Wikipedia bietet mit derzeit über 60 Millionen Artikeln in über 300 Sprachversionen Informationen zu den unterschiedlichsten Wissensbereichen. Auch die rezeptionsorientierte Literaturwissenschaft hat das Projekt inzwischen als Forschungsgegenstand und Datenressource entdeckt, da es viele enzyklopädische Beiträge und Metadaten zur Literatur und zum literarischen Leben versammelt, zu Autor*innen, literarischen Werken, Genres, Epochen und anderen literaturgeschichtlich relevanten Kategorien. Die datenanalytische Auswertung verschiedener Wikipedia-Metriken ermöglicht es, die Auseinandersetzung mit Literatur in Wikipedia evaluierbar zu machen und Aussagen über literarische Kanonizität, Wertungspraktiken und Popularität im Kontext offener Enzyklopädieprojekte weiter zu diversifizieren. Im Zentrum des (hands-on) Workshops steht die Wikipedia-API, mit deren Funktionsweise die Teilnehmer*innen vertraut gemacht werden. Sukzessive werden Abfrageskripte in Form eines Jupyter Notebooks erarbeitet.
2023
Popular, but Hardly Used: Has Google Analytics Been to the Detriment of Web Analytics?.
In: Proceedings of the 15th ACM Web Science Conference 2023, series WebSci '23, pages 304–311. Association for Computing Machinery, New York, NY, USA, 2023.
Tom Alby.
[doi] [abstract] [BibTeX]
In: Proceedings of the 15th ACM Web Science Conference 2023, series WebSci '23, pages 304–311. Association for Computing Machinery, New York, NY, USA, 2023.
Tom Alby.
[doi] [abstract] [BibTeX]
Since 2005, Google has been offering a free version of Google Analytics, allowing website owners to access detailed user behavior data. However, while more and more features and tools have been added to the Google measurement suite since then, it is unclear if the free availability of these tools has really helped users to derive actionable insights for their websites. Earlier studies based on a small number of interviews have suggested that users tend to play with the tools as they lack data literacy, but a broader analysis has been missing by now. Our contribution is a large-scale study of Google Analytics implementations to examine what advanced features are used, allowing conclusions to be drawn about the webmasters’ analysis capabilities. In addition, we detail how difficult it has become to conduct such a study due to the arrangements that website owners have to put in place to comply with the GDPR requirements, but also due to the possibility of obfuscation with the latest development of web analytics software.
A Novel Approach for Identification and Linking of Short Quotations in Scholarly Texts and Literary Works.
Journal of Computational Literary Studies, 2(1), 2023.
Frederik Arnold and Robert Jäschke.
[doi] [abstract] [BibTeX]
Journal of Computational Literary Studies, 2(1), 2023.
Frederik Arnold and Robert Jäschke.
[doi] [abstract] [BibTeX]
We present two approaches for the identification and linking of short quotations between scholarly works and literary works: ProQuo, a specialized pipeline, and ProQuoLM, a more general language model based approach. Our evaluation shows that both approaches outperform a strong baseline and the overall performance is on the same level. We compare the performance of ProQuoLM on texts with and without (page) reference information and find that reference information is not used. Based on our findings, we propose the following steps for future improvements: further analysis of the influence of a bigger context window for better handling of long distance references and the introduction of positional information of the literary work so that reference information can be utilized by ProQuoLM.
Ein Quantum Literatur. Empirische Daten zu einer Theorie des literarischen Textumfangs.
In: F. Jannidis, editor, Digitale Literaturwissenschaft, pages 777-812. J.B. Metzler, Stuttgart, 2023.
Frank Fischer and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: F. Jannidis, editor, Digitale Literaturwissenschaft, pages 777-812. J.B. Metzler, Stuttgart, 2023.
Frank Fischer and Robert Jäschke.
[doi] [abstract] [BibTeX]
An den banalen Umstand, dass jeder literarische Text einen bestimmten Umfang hat, knüpfen sich einige Fragen, die bisher noch kaum gestellt wurden; Fragen etwa nach dem Einfluss von Textlängen auf Rezeptions-, Interpretations- und Kanonisierungsprozesse. Eine Theorie des literarischen Textumfangs fehlt bisher, ist jedoch unter den Bedingungen einer Digitalen Literaturwissenschaft umso dringlicher, da ohne eine entsprechende Klärung die Bedeutung der quantitativen Grenzen des literarischen Erbes nicht angemessen diskutiert werden kann. Bisher ist eher von der Gesamtzahl gescannter Bücher oder Volltexte die Rede, aber kaum von deren jeweiligem Umfang. In diesem Aufsatz versuchen wir daher zwei Dinge: Zum einen möchten wir den Diskurs zum Thema ‚literarischer Textumfang‘ miteröffnen, indem wir einige bisher nur vereinzelt zu findende Überlegungen zur Bedeutung von Textumfängen zusammenführen. Zum anderen sollen die Grundzüge eines Frameworks zur Beforschung entsprechender empirischer Daten beschrieben werden. Letzteres geschieht anhand zweier Metadatensätze, einerseits bezogen auf den weltliterarisch orientierten Kanon 1001 Books You Must Read Before You Die aus dem Jahr 2006, andererseits auf den Katalog der Deutschen Nationalbibliothek (DNB), aus dem wir 180.000 als Roman verschlagwortete Bücher gefiltert haben. Um die DNB-Katalogdaten zu qualifizieren, wurden sie mit der freien Wissensdatenbank Wikidata vernetzt. Da sich beide Datensätze nicht aus Volltexten, sondern aus Metadaten konstituieren, basieren unsere Ergebnisse zwangsläufig auf der Buchseite als Einheit der Textumfangsmessung. In Zukunft wird ein solches Framework auch die Anzahl von Wörtern oder Zeichen in Betracht ziehen müssen, um die variabel-invariable Doppelnatur des literarischen Textumfangs angemessen diskutieren und komparatistisch nutzbar machen zu können.
Preface: World Literature in an Expanding Digital Space.
Journal of Cultural Analytics, 8(2), 2023.
Frank Fischer, Jacob Blakesley, Paula Wojcik and Robert Jäschke.
[doi] [abstract] [BibTeX]
Journal of Cultural Analytics, 8(2), 2023.
Frank Fischer, Jacob Blakesley, Paula Wojcik and Robert Jäschke.
[doi] [abstract] [BibTeX]
Wikipedia, the world’s largest encyclopedia, and Wikidata, the rapidly growing knowledge graph, are not yet widely used in literary studies, but their scale and multilingualism make them particularly suitable as new means for the study of world literature. This is the hypothesis at the heart of this special issue. Our preface provides a research overview of the topic, briefly summarizes the articles that constitute this issue, and focuses on overarching aspects and common challenges.
Cover Song Identification in Practice with Multimodal Co-Training.
In: M. Leyer and J. Wichmann, editors, Proceedings of the Conference on ``Lernen, Wissen, Daten, Analysen'', series CEUR Workshop Proceedings, pages 359-371. Aachen, 2023.
Simon Hachmeier and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: M. Leyer and J. Wichmann, editors, Proceedings of the Conference on ``Lernen, Wissen, Daten, Analysen'', series CEUR Workshop Proceedings, pages 359-371. Aachen, 2023.
Simon Hachmeier and Robert Jäschke.
[doi] [abstract] [BibTeX]
The task of cover song identification (CSI) deals with the automatic matching of audio recordings by modeling musical similarity. CSI is of high relevance in the context of applications such as copyright infringement detection on online video platforms. Since online videos include metadata (eg. video titles, descriptions), one could leverage it for more effective CSI in practice. In this work, we experiment with state-of-the-art models of CSI and entity matching in a Co-Training ensemble. Our results outline slight improvements of the entity matching model. We further outline some suggestions for improvements of our approach to overcome the issue of overfitting CSI models which we observed.
Graph-Based Representation and Reasoning.
Lecture Notes in Computer Science. volume 14133. Springer, Cham, 2023.
Manuel Ojeda-Aciego, Kai Sauerwald and Robert Jäschke.
[doi] [abstract] [BibTeX]
Lecture Notes in Computer Science. volume 14133. Springer, Cham, 2023.
Manuel Ojeda-Aciego, Kai Sauerwald and Robert Jäschke.
[doi] [abstract] [BibTeX]
This book constitutes the refereed deadline proceedings of the 28th International Conference on Graph-Based Representation and Reasoning, ICCS 2023, held in Berlin, Germany, during September 11–13, 2023.
The 9 full papers, 5 short papers and 4 Posters are included in this book were carefully reviewed and selected from 32 submissions. They were organized in topical sections as follows: Complexity and Database Theory, Formal Concept Analysis: Theoretical Advances, Formal Concept Analysis: Applications, Modelling and Explanation, Semantic Web and Graphs, Posters.
Annotated Vossian Antonomasia Dataset.
2023.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
2023.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
This dataset is a collection of Vossian Antonomasia (VA). It comprises 6,096 entries, 3,115 of them contain a VA expression in the associated sentence. When a VA expression exists, the source (`*`), target (`|`), and modifier (`/`) are tagged by surrounding the respective words with the indicated character. Each entry also contains
- a link to the New York Times article that contains the sentence,
- the Wikidata IDs for both, the source and target (if they exist),
- the full target name (if it is mentioned in the corresponding NYT article).
Creation: The dataset has been developed through a series of research papers. Initially, Schwab et al. (2019) created a dataset based on the NYT corpus by Sandhaus (2008) with binary labels, source annotations, and the corresponding Wikidata IDs for sources. The annotation of modifier and target was conducted in Schwab et al. (2022). The extraction of the full target name and the Wikidata ID of the target was performed in Schwab et al. (2023).
»Die Greta Garbo der Leichtathletik« – Eine systematische
Analyse der Modifier vossianischer Antonomasien mithilfe von Word
Embeddings.
In: Proceedings of the DHd, series DHd'23. 2023.
Michel Schwab and Frank Fischer.
[doi] [BibTeX]
In: Proceedings of the DHd, series DHd'23. 2023.
Michel Schwab and Frank Fischer.
[doi] [BibTeX]
»Japan’s Answer to Mozart«: Automatic Detection of Generalized Patterns of Vossian Antonomasia.
In: M. Abbas and A. A. Freihat, editors, Proceedings of the 6th International Conference on Natural Language and Speech Processing, series ICNLSP'23, pages 99-109. Association for Computational Linguistics, 2023.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
In: M. Abbas and A. A. Freihat, editors, Proceedings of the 6th International Conference on Natural Language and Speech Processing, series ICNLSP'23, pages 99-109. Association for Computational Linguistics, 2023.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
Vossian Antonomasia (VA) is a rhetorical device used to describe an entity (the target) by transferring certain features and characteristics of another entity (the source) to it. The phenomenon is closely related to metaphor and metonymy. Similar to these more familiar devices, the detection of VA expressions is a challenging task. We propose novel VA detection models that center on the source to tackle this problem. The focus lies on the ability of the models to detect VA independent of the syntactic patterns they appear in. We model the problem in different scenarios and utilize a state-of-the-art metonymy resolution model that relies on word masking, and metaphor detection models, which are based on linguistic metaphor theories, and adjust them to our task. All models leverage pre-trained language models such as BERT and RoBERTa. As there is limited annotated data available, we use a data augmentation technique to create a new dataset consisting of VA with new syntactic patterns where the generalization ability of the models can be evaluated.
»Who is the Madonna of Italian-American Literature?«: Extracting and Analyzing Target Entities of Vossian Antonomasia.
In: Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 110-115. Association for Computational Linguistics, 2023.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
In: Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 110-115. Association for Computational Linguistics, 2023.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
In this paper, we present approaches for the automated extraction and disambiguation of a part of the stylistic device Vossian Antonomasia (VA), namely the target entity described by the expression. We model the problem as a coreference resolution and a question answering task and also combine both. To tackle the tasks at hand, we utilize state-of-the-art models in these areas. In addition, we visualize the connection between source and target entities of VA in a web demo to provide a deeper understanding of their mutual relationship.
2022
Analyzing the Web: Are Top Websites Lists a Good Choice for Research?.
In: Proceedings of the International Conference on Theory and Practice of Digital Libraries, series TPDL '22, pages 11-25. Springer, Cham, 2022.
Tom Alby and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: Proceedings of the International Conference on Theory and Practice of Digital Libraries, series TPDL '22, pages 11-25. Springer, Cham, 2022.
Tom Alby and Robert Jäschke.
[doi] [abstract] [BibTeX]
The web has been a subject of research since its beginning, but it is difficult if not impossible to analyze the whole web, even if a database of all URLs would be freely accessible. Hundreds of studies have used commercial top websites lists as a shortcut, in particular the Alexa One Million Top Sites list. However, apart from the fact that Amazon decided to terminate Alexa, we question the usefulness of such lists for research as they have several shortcomings. Our analysis shows that top sites lists miss frequently visited websites and offer only little value for language-specific research. We present a heuristic-driven alternative based on the Common Crawl host-level web graph while also taking language-specific requirements into account.
A Game with Complex Rules: Literature References in Literary Studies .
In: Proceedings of the Workshop Understanding LIterature references in academic full TExt at JCDL 2022, volume 3220, series ULITE-ws '22, pages 7-15. CEUR Workshop Proceedings, 2022.
Frederik Arnold and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: Proceedings of the Workshop Understanding LIterature references in academic full TExt at JCDL 2022, volume 3220, series ULITE-ws '22, pages 7-15. CEUR Workshop Proceedings, 2022.
Frederik Arnold and Robert Jäschke.
[doi] [abstract] [BibTeX]
Existing systems for reference extraction and segmentation are mostly tailored towards STEM fields (science, technology, engineering, medicine) and social sciences and can not properly handle references in literary studies. We present our annotation guidelines for literature references in literary studies and give an overview of difficult cases we encountered when creating a corpus of annotated scholarly works for literary studies. Specifically, we present challenges and requirements we identified for reference extraction and segmentation from scholarly articles in the field of literary studies
Salience in Literary Texts: A Combined Approach to the Relevance of Passages.
In: DH2022. 2022.
Frederik Arnold, Benjamin Fiechter, Evelyn Gius, Robert Jäschke, Steffen Martus and Michael Vauth.
[BibTeX]
In: DH2022. 2022.
Frederik Arnold, Benjamin Fiechter, Evelyn Gius, Robert Jäschke, Steffen Martus and Michael Vauth.
[BibTeX]
Graph-Based Representation and Reasoning: Proceedings of the 27th International Conference on Conceptual Structures.
Lecture Notes in Computer Science. volume 13403. Springer Cham, 2022.
Tanya Braun, Diana Cristea and Robert Jäschke.
[doi] [abstract] [BibTeX]
Lecture Notes in Computer Science. volume 13403. Springer Cham, 2022.
Tanya Braun, Diana Cristea and Robert Jäschke.
[doi] [abstract] [BibTeX]
This book constitutes the proceedings of the 27th International Conference on Conceptual Structures, ICCS 2022, held virtually in September 2022.
The 7 full papers and 1 short paper presented were carefully reviewed and selected from 25 submissions. The papers focus on the representation of and reasoning with conceptual structures in a variety of contexts.
Music Version Retrieval from YouTube: How to Formulate Effective Search Queries?.
In: P. Reuss, V. Eisenstadt, J. Schönborn and J. Schäfer, editors, Proceedings of the Conference on ``Lernen, Wissen, Daten, Analysen'', series CEUR Workshop Proceedings, pages 213-226. Aachen, 2022.
Simon Hachmeier, Robert Jäschke and Hadi Saadatdoorabi.
[doi] [abstract] [BibTeX]
In: P. Reuss, V. Eisenstadt, J. Schönborn and J. Schäfer, editors, Proceedings of the Conference on ``Lernen, Wissen, Daten, Analysen'', series CEUR Workshop Proceedings, pages 213-226. Aachen, 2022.
Simon Hachmeier, Robert Jäschke and Hadi Saadatdoorabi.
[doi] [abstract] [BibTeX]
Various versions of musical works are published on YouTube, such as remixes or reaction videos. While some research has focused on tasks like audio-based version identification of these videos, it is still unclear how to effectively retrieve a large amount of relevant versions with textual queries. In this paper, we formulate search queries with YouTube search suggestions, evaluate these based on multiple dimensions and compute optimal ranks of queries on work-level. We show that queries containing the artist string retrieve results with higher relevance, but have higher overlaps. Additionally, we demonstrate that the amount of reasonable queries can be increased by applying frequently suggested expansions to works which tend to contextualize queries to the music domain.
»Der Frank Sinatra der Wettervorhersage« – Cross-Lingual Vossian Antonomasia Extraction.
In: Proceedings of the 5th International Conference on Natural Language and Speech Processing, series ICNLSP'22, pages 282-287. Association for Computational Linguistics, 2022.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
In: Proceedings of the 5th International Conference on Natural Language and Speech Processing, series ICNLSP'22, pages 282-287. Association for Computational Linguistics, 2022.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
We present a cross-lingual approach for the extraction of Vossian Antonomasia, a stylistic device especially popular in newspaper articles. We evaluate a zero-shot transfer learning approach and two approaches that use machine-translated training and test data. We show that our proposed models achieve strong results on all test datasets in the target language. As annotated data is sparse, especially in the target language, we generate additional test data to evaluate our models and conclude with a robustness study on real-world data.
»The Rodney Dangerfield of Stylistic Devices« – End-to-End Detection and Extraction of Vossian Antonomasia Using Neural Networks.
Frontiers in Artificial Intelligence , 5, 2022.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
Frontiers in Artificial Intelligence , 5, 2022.
Michel Schwab, Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
Vossian Antonomasia (VA) is a well-known stylistic device based on attributing a certain property to a person by relating them to another person who is famous for this property. Although the morphological and semantic characteristics of this phenomenon have long been the subject of linguistic research, little is known about its distribution. In this paper, we describe end-to-end approaches for detecting and extracting VA expressions from large news corpora in order to study VA more broadly. We present two types of approaches: binary sentence classifiers that detect whether or not a sentence contains a VA expression, and sequence tagging of all parts of a VA on the word level, enabling their extraction. All models are based on neural networks and outperform previous approaches, best results are obtained with a fine-tuned BERT model. Furthermore, we study the impact of training data size and class imbalance by adding negative (and possibly noisy) instances to the training data. We also evaluate the models' performance on out-of-corpus and real-world data and analyze the ability of the sequence tagging model to generalize in terms of new entity types and syntactic patterns.
Where are the Datasets? A case study on the German Academic Web Archive.
In: Proceedings of the Web Archiving and Digital Libraries Workshop at JCDL 2022. 2022.
Yousef Younes, Sebastian Tiesler, Robert Jäschke and Brigitte Mathiak.
[abstract] [BibTeX]
In: Proceedings of the Web Archiving and Digital Libraries Workshop at JCDL 2022. 2022.
Yousef Younes, Sebastian Tiesler, Robert Jäschke and Brigitte Mathiak.
[abstract] [BibTeX]
The German Academic Web (GAW) is a longitudinal archive of websites from German academic institutions, mainly universities. It can support answering research questions about academia in Germany. Recent discussions about reproducible research have brought the availability and sharing of research data into focus. Collecting, linking, and providing metadata about research data is thus an important task for infrastructure facilities. In this work, we examine how existing datasets are linked and referenced on German academic web pages using the GAW archive. For that, we use the social sciences and economics datasets registered at da|ra as our case study.
The results show that academic web pages as presented in GAW are not a good foundation to answer dataset-related questions. But from the few results found, it was obvious that da|ra datasets are usually mentioned using their DOIs and not their URLs.
2021
Lotte and Annette: A Framework for Finding and Exploring Key Passages in Literary Works.
In: Proceedings of the Workshop on Natural Language Processing for Digital Humanities at ICON 2021, pages 55-63. NLP Association of India, 2021.
Frederik Arnold and Robert Jäschke.
[doi] [BibTeX]
In: Proceedings of the Workshop on Natural Language Processing for Digital Humanities at ICON 2021, pages 55-63. NLP Association of India, 2021.
Frederik Arnold and Robert Jäschke.
[doi] [BibTeX]
Proximity dimensions and the emergence of collaboration: a
HypTrails study on German AI research.
Scientometrics, 126:9847-9868, 2021.
Tobias Koopmann, Maximilian Stubbemann, Matthias Kapa, Michael Paris, Guido Buenstorf, Tom Hanika, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
Scientometrics, 126:9847-9868, 2021.
Tobias Koopmann, Maximilian Stubbemann, Matthias Kapa, Michael Paris, Guido Buenstorf, Tom Hanika, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
Creation and exchange of knowledge depends on collaboration. Recent work has suggested that the emergence of collaboration frequently relies on geographic proximity. However, being co-located tends to be associated with other dimensions of proximity, such as social ties or a shared organizational environment. To account for such factors, multiple dimensions of proximity have been proposed, including cognitive, institutional, organizational, social and geographical proximity. Since they strongly interrelate, disentangling these dimensions and their respective impact on collaboration is challenging. To address this issue, we propose various methods for measuring different dimensions of proximity. We then present an approach to compare and rank them with respect to the extent to which they indicate co-publications and co-inventions. We adapt the HypTrails approach, which was originally developed to explain human navigation, to co-author and co-inventor graphs. We evaluate this approach on a subset of the German research community, specifically academic authors and inventors active in research on artificial intelligence (AI). We find that social proximity and cognitive proximity are more important for the emergence of collaboration than geographic proximity.
Evaluating dataset creation heuristics for concept detection in web pages using BERT.
In: Proceedings of the 14th International Conference on Knowledge Science, Engineering and Management, volume 12816, series Lecture Notes in Artificial Intelligence, pages 1-14. Springer, 2021.
Michael Paris and Robert Jäschke.
[abstract] [BibTeX]
In: Proceedings of the 14th International Conference on Knowledge Science, Engineering and Management, volume 12816, series Lecture Notes in Artificial Intelligence, pages 1-14. Springer, 2021.
Michael Paris and Robert Jäschke.
[abstract] [BibTeX]
Dataset creation for the purpose of training natural language
processing (NLP) algorithms is often accompanied by an
uncertainty about how the target concept is represented in the
data. Extracting such data from web pages and verifying its
quality is a non-trivial task, due to the Web's unstructured and
heterogeneous nature and the cost of annotation. In that
situation, annotation heuristics can be employed to create a
dataset that captures the target concept, but in turn may lead to
an unstable downstream performance. On the one hand, a trade-off
exists between cost, quality, and magnitude for annotation
heuristics in tasks such as classification, leading to
fluctuations in trained models' performance. On the other hand,
general-purpose NLP tools like BERT are now commonly used to
benchmark new models on a range of tasks on static datasets. We
utilize this standardization as a means to assess dataset
quality, as most applications are dataset specific. In this
study, we investigate and evaluate the performance of three
annotation heuristics for a classification task on extracted web
data using BERT. We present multiple datasets, from which the
classifier shall learn to identify web pages that are centered
around an individual in the academic domain. In addition, we
assess the relationship between the performance of the trained
classifier and the training data size. The models are further
tested on out-of-domain web pages, to asses the influence of the
individuals' occupation and web page domain.
2020
To Follow Or To Unfollow: Motives For The Academic Use Of Twitter.
In: Proceedings of the 14th International Technology, Education and Development Conference, series INTED, pages 1009-1018. IATED, 2020.
S.B. Linek, C.P. Hoffmann and R. Jäschke.
[doi] [abstract] [BibTeX]
In: Proceedings of the 14th International Technology, Education and Development Conference, series INTED, pages 1009-1018. IATED, 2020.
S.B. Linek, C.P. Hoffmann and R. Jäschke.
[doi] [abstract] [BibTeX]
Twitter appears to be a popular social media service for academics, especially computer scientists. While some studies have begun to examine motives for academic Twitter use, little is known about academics’ considerations for following and unfollowing other users.
Our empirical study explored general motives for the academic use of the social media platform Twitter. Based on the uses and gratifications theory and prior research as well as a review of existing scales, we designed a detailed questionnaire on motives for the academic use of Twitter. Besides the general motives for the academic use of Twitter we also analyzed subjective considerations for following and unfollowing accounts. The latter questions aimed at deeper insights in the networking behavior on Twitter and a better understanding of the adoption of social media in academia and their potential influence on the research process. The online survey was presented to 54 computer scientists that were active on Twitter.
Results show that academic Twitter use is generally characterized by information motives as well as by various social considerations. As the main reasons for using Twitter, we identify dissemination and, to a lesser degree, collection of information. However, users are also motivated by community development considerations. Accordingly, when following an account, users do not only look for content that is informative, interesting, of high quality, and current. They also tend to follow an account whose owner shares similar research interests, is an important researcher in the field, and that is personally known and liked. Unfollowing, while rather ubiquitous, is largely driven by considerations of content.
To summarize, we find that academics subjective considerations oscillates between content and personal aspects, with content aspects driving usage, but personal aspects also shaping following decisions. These insights contribute to the current state of research on motives of academic Twitter usage finding that information and community development motives play central roles in the ensuing communication behavior and structures. Although previous studies have found that academic hierarchies are replicated in online social networking structures, our findings imply that this influence may be mediated by information considerations: wishing to collect helpful information on Twitter, academics tend to follow well-known colleagues in the field. However, the results of our survey suggest that the academic status of an account owner per se is not an important factor in following decisions.
As this study focused on computer scientists on Twitter, it is an open question if and to what extend the findings are valid for other disciplines and other social media. A more comprehensive analysis involving other disciplines and also the simultaneous use of various social media would provide a more holistic view of the academic use of social media.
How to Assess the Exhaustiveness of Longitudinal Web Archives.
In: Proceedings of the 31st ACM Conference on Hypertext and Social Media. ACM, 2020.
Michael Paris and Robert Jäschke.
[doi] [BibTeX]
In: Proceedings of the 31st ACM Conference on Hypertext and Social Media. ACM, 2020.
Michael Paris and Robert Jäschke.
[doi] [BibTeX]
How to Assess the Exhaustiveness of Longitudinal Web Archives: A Case Study of the German Academic Web.
In: Proceedings of the 31st ACM Conference on Hypertext and Social Media, series HT ’20. ACM, New York, NY, USA, 2020.
Michael Paris and Robert Jäschke.
[abstract] [BibTeX]
In: Proceedings of the 31st ACM Conference on Hypertext and Social Media, series HT ’20. ACM, New York, NY, USA, 2020.
Michael Paris and Robert Jäschke.
[abstract] [BibTeX]
Longitudinal web archives can be a foundation for investigating structural and content-based research questions. One prerequisite is that they contain a faithful representation of the relevant subset of the web. Therefore, an assessment of the authority of a given data set with respect to a research question should precede the actual investigation. Next to proper creation and curation, this requires measures for estimating the potential of a longitudinal web archive to yield information about the central objects the research question aims to investigate. In particular, content-based research questions often lack the ab-initio confidence about the integrity of the data. In this paper we focus on one specifically important aspect, namely the exhaustiveness of the data set with respect to the central objects. Therefore, we investigate the recall coverage of researcher names in a longitudinal academic web crawl over a seven year period and the influence of our crawl method on the data set integrity. Additionally, we propose a method to estimate the amount of missing information as a means to describe the exhaustiveness of the crawl and motivate a use case for the presented corpus.
2019
»The Michael Jordan of greatness« – Extracting Vossian antonomasia from two decades of The New York Times, 1987–2007.
Digital Scholarship in the Humanities, 35(1):34–42, 2019.
Frank Fischer and Robert Jäschke.
[doi] [abstract] [BibTeX]
Digital Scholarship in the Humanities, 35(1):34–42, 2019.
Frank Fischer and Robert Jäschke.
[doi] [abstract] [BibTeX]
Vossian antonomasia is a prolific stylistic device, in use since antiquity. It can compress the introduction or description of a person or another named entity into a terse, poignant formulation and can best be explained by an example: When Norwegian world champion Magnus Carlsen is described as ‘the Mozart of chess’, it is Vossian antonomasia we are dealing with. The pattern is simple: A source (Mozart) is used to describe a target (Magnus Carlsen), the transfer of meaning is reached via a modifier (‘of chess’). This phenomenon has been discussed before (as ‘metaphorical antonomasia’ or, with special focus on the source object, as ‘paragons’), but no corpus-based approach has been undertaken as yet to explore its breadth and variety. We are looking into a full-text newspaper corpus (The New York Times, 1987–2007) and describe a new method for the automatic extraction of Vossian antonomasia based on Wikidata entities. Our analysis offers new insights into the occurrence of popular paragons and their distribution.
Proceedings of the Conference on "Lernen, Wissen, Daten, Analysen".
CEUR Workshop Proceedings. number 2454. Aachen, 2019.
Robert Jäschke and Matthias Weidlich.
[doi] [BibTeX]
CEUR Workshop Proceedings. number 2454. Aachen, 2019.
Robert Jäschke and Matthias Weidlich.
[doi] [BibTeX]
»A Buster Keaton of Linguistics« – First Automated Approaches for the Extraction of Vossian Antonomasia.
2019.
Michel Schwab, Robert Jäschke, Frank Fischer and Jannik Strötgen.
[doi] [BibTeX]
2019.
Michel Schwab, Robert Jäschke, Frank Fischer and Jannik Strötgen.
[doi] [BibTeX]
»A Buster Keaton of Linguistics« – First Automated Approaches for the Extraction of Vossian Antonomasia.
In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, series EMNLP '19, pages 6239-6244. Association for Computational Linguistics, 2019.
Michel Schwab, Robert Jäschke, Frank Fischer and Jannik Strötgen.
[doi] [abstract] [BibTeX]
In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, series EMNLP '19, pages 6239-6244. Association for Computational Linguistics, 2019.
Michel Schwab, Robert Jäschke, Frank Fischer and Jannik Strötgen.
[doi] [abstract] [BibTeX]
Attributing a particular property to a person by naming another person, who is typically well-known for the respective property, is called a Vossian antonomasia (VA). While identifying this subtype of metonymy is of particular interest in the study of stylistics, it is also a source of errors in relation and fact extraction as an explicitly mentioned entity occurs only metaphorically and should not be associated with respective contexts. Despite rather simple syntactic variations, the automatic extraction of VA was never addressed so far as it requires a deep semantic understanding of mentioned entities and underlying relations, and is thus very challenging. In this paper, we propose the first method for the extraction of VA that works completely automatically. Our approaches use distant supervision based on Wikidata, NER and relies on a bi-directional LSTM for postprocessing. The evaluation on 1.8 million articles of the New York Times corpus shows that our approach significantly outperforms the only existing semi-automatic approach for VA identification by more than 30 percent points in precision.
2018
Proceedings of the International Workshop on Bias in Information, Algorithms, and Systems (BIAS).
CEUR Workshop Proceedings. number 2103. Aachen, 2018.
Jo Bates, Paul D. Clough, Robert Jäschke and Jahna Otterbacher.
[doi] [BibTeX]
CEUR Workshop Proceedings. number 2103. Aachen, 2018.
Jo Bates, Paul D. Clough, Robert Jäschke and Jahna Otterbacher.
[doi] [BibTeX]
Towards Bias Detection in Online Text Corpora.
In: International Workshop on Bias in Information, Algorithms, and Systems (BIAS), series CEUR Workshop Proceedings, pages 19-23. Aachen, 2018.
Christoph Hube, Besnik Fetahu and Robert Jäschke.
[doi] [BibTeX]
In: International Workshop on Bias in Information, Algorithms, and Systems (BIAS), series CEUR Workshop Proceedings, pages 19-23. Aachen, 2018.
Christoph Hube, Besnik Fetahu and Robert Jäschke.
[doi] [BibTeX]
Datenschätze selber heben: Data Science und Bibliotheken.
2018.
Robert Jäschke.
[doi] [abstract] [BibTeX]
2018.
Robert Jäschke.
[doi] [abstract] [BibTeX]
Die Verarbeitung und Analyse großer Datenmengen - subsummiert unter dem Begriff "Data Science" - ist nicht nur in zahlreichen Wissenschaftsbereichen oder in der Markt- und Nutzerforschung ein Weg, neue Erkenntnisse und Entscheidungsgrundlagen zu gewinnen. Auch Bibliotheken und ihre nutzerorientierten Dienste können von Data Science profitieren. Dafür sind die Bestände und Daten der Bibliotheken selbst ein wichtiges Rohmaterial.Der Vortrag stellt Beispiele für Projekte aus den Geistes- und Sozialwissenschaften vor, die auf Bibliotheksdaten (z.B. der Deutschen Nationalbibliothek) aufbauen oder das Verhalten von NutzerInnen erforschen. Sie stehen beispielhaft für das Forschungsprogramm der neuen Professur für "Information Processing and Analytics" am Institut für Bibliotheks- und Informationswissenschaft der Humboldt-Universität zu Berlin.Durch die neue Professur werden die für solche Analysen notwendigen Fähigkeiten auch an die Studierenden vermittelt und werden damit den Bibliotheken in vielerlei Hinsicht zugute kommen.
Liebe und Tod in der Deutschen Nationalbibliothek: Der DNB-Katalog als Forschungsobjekt der digitalen Literaturwissenschaft .
In: Konferenzabstracts der DHd 2018, series DHd'18, pages 261-266. 2018.
Robert Jäschke and Frank Fischer.
[doi] [BibTeX]
In: Konferenzabstracts der DHd 2018, series DHd'18, pages 261-266. 2018.
Robert Jäschke and Frank Fischer.
[doi] [BibTeX]
»Der Henry Ford des Computerzeitalters« – Ein Vossanto-Memory.
2018.
Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
2018.
Robert Jäschke and Frank Fischer.
[doi] [abstract] [BibTeX]
Das Poster beschreibt das literarische Stilmittel der "Vossianischen Antonomasie" sowie eine Methode zur semi-automatischen Entdeckung in großen Korpora. Es enthält außerdem Ergebnisse aus einem Korpus der "New York Times" sowie ein Memory-Spiel basierend auf Vossianischen Antonomasien aus der Wochenzeitung "Die Zeit".
2017
World Literature According to Wikipedia: Introduction to a DBpedia-Based Framework.
2017.
Christoph Hube, Frank Fischer, Robert Jäschke, Gerhard Lauer and Mads Rosendahl Thomsen.
[doi] [abstract] [BibTeX]
2017.
Christoph Hube, Frank Fischer, Robert Jäschke, Gerhard Lauer and Mads Rosendahl Thomsen.
[doi] [abstract] [BibTeX]
Among the manifold takes on world literature, it is our goal to contribute to the discussion from a digital point of view by analyzing the representation of world literature in Wikipedia with its millions of articles in hundreds of languages. As a preliminary, we introduce and compare three different approaches to identify writers on Wikipedia using data from DBpedia, a community project with the goal of extracting and providing structured information from Wikipedia. Equipped with our basic set of writers, we analyze how they are represented throughout the 15 biggest Wikipedia language versions. We combine intrinsic measures (mostly examining the connectedness of articles) with extrinsic ones (analyzing how often articles are frequented by readers) and develop methods to evaluate our results. The better part of our findings seems to convey a rather conservative, old-fashioned version of world literature, but a version derived from reproducible facts revealing an implicit literary canon based on the editing and reading behavior of millions of people. While still having to solve some known issues, the introduced methods will help us build an observatory of world literature to further investigate its representativeness and biases.
New media, familiar dynamics: academic hierarchies influence academics' following behaviour on Twitter.
2017.
Robert Jäschke, Stephanie B. Linek and Christian P. Hoffmann.
[doi] [abstract] [BibTeX]
2017.
Robert Jäschke, Stephanie B. Linek and Christian P. Hoffmann.
[doi] [abstract] [BibTeX]
For what reasons do academics follow one another on Twitter? Robert Jäschke, Stephanie B. Linek and Christian P. Hoffmann analysed the Twitter activity of computer scientists and found that while the quality of information provided by a Twitter account is a key motive for following academic colleagues, there is also evidence of a career planning motive. As well as there being reciprocal following between users of the same academic status (except, remarkably, between PhD researchers), a form of strategic politeness can be observed whereby users follow those of higher academic status without necessarily being followed back. The emerging academic public sphere facilitated by Twitter is largely shaped by dynamics and hierarchies all too familiar to researchers struggling to plot their careers in academia.
»Der Helmut Kohl unter den Brotaufstrichen« – Zur Extraktion Vossianischer Antonomasien aus großen Zeitungskorpora.
In: Proceedings of the DHd 2017, series DHd '17, pages 120-124. 2017.
Robert Jäschke, Jannik Strötgen, Elena Krotova and Frank Fischer.
[BibTeX]
In: Proceedings of the DHd 2017, series DHd '17, pages 120-124. 2017.
Robert Jäschke, Jannik Strötgen, Elena Krotova and Frank Fischer.
[BibTeX]
It’s all about information? The Following Behaviour of Professors and PhD Students on Twitter.
The Journal of Web Science, 3(1):1-15, 2017.
Stephanie Linek, Asmelash Teka Hadgu, Christian Pieter Hoffmann, Robert Jäschke and Cornelius Puschmann.
[doi] [abstract] [BibTeX]
The Journal of Web Science, 3(1):1-15, 2017.
Stephanie Linek, Asmelash Teka Hadgu, Christian Pieter Hoffmann, Robert Jäschke and Cornelius Puschmann.
[doi] [abstract] [BibTeX]
In this paper we investigate the role of the academic status in the following behaviour of computer scientists on Twitter. Based on a uses and gratifications perspective, we focus on the activity of a Twitter account and the reciprocity of following relationships. We propose that the account activity addresses the users' information motive only, whereas the user's academic status relates to both the information motive and community development (as in peer networking or career planning).
Variables were extracted from Twitter user data. We applied a biographical approach to correctly identify the academic status (professor versus PhD student). We calculated a 2×2 MANOVA on the influence of the activity of the account and the academic status (on different groups of followers) to differentiate the influence of the information motive versus the motive for community development.
Results suggest that for computer scientists Twitter is mainly an information network. However, we found significant effects in the sense of career planning, that is, the accounts of professors had even in the case of low activity a relatively high number of researcher followers -- both PhD followers as well as professor followers. Additionally, there was also some weak evidence for community development gratifications in the sense of peer-networking of professors.
Overall, we conclude that the academic use of Twitter is not only about information, but also about career planning and networking.
What do computer scientists tweet? Analyzing the link-sharing practice on Twitter.
PLoS ONE, 12(6), 2017.
Marco Schmitt and Robert Jäschke.
[doi] [abstract] [BibTeX]
PLoS ONE, 12(6), 2017.
Marco Schmitt and Robert Jäschke.
[doi] [abstract] [BibTeX]
Twitter communication has permeated every sphere of society. To highlight and share small pieces of information with possibly vast audiences or small circles of the interested has some value in almost any aspect of social life. But what is the value exactly for a scientific field? We perform a comprehensive study of computer scientists using Twitter and their tweeting behavior concerning the sharing of web links. Discerning the domains, hosts and individual web pages being tweeted and the differences between computer scientists and a Twitter sample enables us to look in depth at the Twitter-based information sharing practices of a scientific community. Additionally, we aim at providing a deeper understanding of the role and impact of altmetrics in computer science and give a glance at the publications mentioned on Twitter that are most relevant for the computer science community. Our results show a link sharing culture that concentrates more heavily on public and professional quality information than the Twitter sample does. The results also show a broad variety in linked sources and especially in linked publications with some publications clearly related to community-specific interests of computer scientists, while others with a strong relation to attention mechanisms in social media. This refers to the observation that Twitter is a hybrid form of social media between an information service and a social network service. Overall the computer scientists’ style of usage seems to be more on the information-oriented side and to some degree also on professional usage. Therefore, altmetrics are of considerable use in analyzing computer science.
2016
Tweeting in times of exposure: A mixed-methods approach for exploring patterns of communication related to business scandals on Twitter.
In: Proceedings of the Workshop on Natural Language Processing and Computational Social Science, series NLP+CSS at WebSci. Hannover, Germany, 2016.
Jens Bergmann, Asmelash Teka Hadgu and Robert Jäschke.
[abstract] [BibTeX]
In: Proceedings of the Workshop on Natural Language Processing and Computational Social Science, series NLP+CSS at WebSci. Hannover, Germany, 2016.
Jens Bergmann, Asmelash Teka Hadgu and Robert Jäschke.
[abstract] [BibTeX]
Currently, three trends mutually influence each other and can be observed using social media: (a) the growing use of social media, in particular Twitter, by organizations, (b) increased expectations of transparency towards organizations, and (c) massive public response to organizational crises via social media. Getting an understanding on how customers and organizations react to crises and crises responses as well as identifying different communication strategies is difficult, since the large amount of actors and the abundance of messages can not be handled by traditional methods from the Social Sciences. These often rely on manual work, for instance, interviews, qualitative studies, or questionnaires. Even large parts of content analysis using computer-assisted qualitative data analysis software have to be supported by manual work. At the same time, the availability and accessibility of large volumes of messages on Twitter also opens up possibilities for mixed-methods approaches to analyze this data. In particular, natural language processing can support the analysis of large sets of tweets. In this work we present first steps towards a large-scale analysis of Twitter communication during corporate crises by leveraging a mixed-methods approach. Such analyses can improve our understanding of organizational crises and their communication and can also prove beneficial to provide recommendation for successful reactions and interactions.
Cäsar Flaischlens Graphische Litteratur-Tafel digital.
Poster at 3rd DHA conference. 2016.
Ingo Börner, Frank Fischer, Angelika Hechtl, Robert Jäschke and Peer Trilcke.
[doi] [BibTeX]
Poster at 3rd DHA conference. 2016.
Ingo Börner, Frank Fischer, Angelika Hechtl, Robert Jäschke and Peer Trilcke.
[doi] [BibTeX]
The Role of Cores in Recommender Benchmarking for Social Bookmarking Systems.
Transactions on Intelligent Systems and Technology, 7(3):40:1-40:33, 2016.
Stephan Doerfel, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
Transactions on Intelligent Systems and Technology, 7(3):40:1-40:33, 2016.
Stephan Doerfel, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
Social bookmarking systems have established themselves as an important part in today’s web. In such systems, tag recommender systems support users during the posting of a resource by suggesting suitable tags. Tag recommender algorithms have often been evaluated in offline benchmarking experiments. Yet, the particular setup of such experiments has rarely been analyzed. In particular, since the recommendation quality usually suffers from difficulties like the sparsity of the data or the cold start problem for new resources or users, datasets have often been pruned to so-called cores (specific subsets of the original datasets) – however without much consideration of the implications on the benchmarking results.
In this paper, we generalize the notion of a core by introducing the new notion of a set-core – which is independent of any graph structure – to overcome a structural drawback in the previous constructions of cores on tagging data. We show that problems caused by some types of cores can be eliminated using set-cores. Further, we present a thorough analysis of tag recommender benchmarking setups using cores. To that end, we conduct a large-scale experiment on four real-world datasets in which we analyze the influence of different cores on the evaluation of recommendation algorithms. We can show that the results of the comparison of different recommendation approaches depends on the selection of core type and level. For the benchmarking of tag recommender algorithms, our results suggest that the evaluation must be set up more carefully and should not be based on one arbitrarily chosen core type and level
Telling English Tweets Apart: the Case of US, GB, AU.
In: Proceedings of the Workshop on Natural Language Processing and Computational Social Science, series NLP+CSS at WebSci. Hannover, Germany, 2016.
Asmelash Teka Hadgu, Netaya Lotze and Robert Jäschke.
[abstract] [BibTeX]
In: Proceedings of the Workshop on Natural Language Processing and Computational Social Science, series NLP+CSS at WebSci. Hannover, Germany, 2016.
Asmelash Teka Hadgu, Netaya Lotze and Robert Jäschke.
[abstract] [BibTeX]
In this paper, we study how to automatically tell different varieties of English apart on Twitter by taking samples from American (US), British (GB) and Australian (AU) English. We track cities and apply filters to generate ground-truth data. We perform expert evaluation to get a sense of the difficulty of the task. We then cast the problem as a classification task: given a tweet (or a set of tweets from a user) in English, the goal is to automatically identify whether the tweet (or set of tweets) is US, GB or AU English. We perform experiments to compare some linguistic features against simple statistical features and show that character Ngrams are quite effective for the task.
The 8th ACM Web Science Conference 2016.
SIGWEB Newsletter(Summer):1:1-1:7, 2016.
Robert Jäschke.
[doi] [abstract] [BibTeX]
SIGWEB Newsletter(Summer):1:1-1:7, 2016.
Robert Jäschke.
[doi] [abstract] [BibTeX]
This article provides an overview of this year's ACM Web Science Conference (WebSci'16). It was located in Hannover, Germany, and organized by L3S Research Center and the Web Science Trust. WebSci'16 attracted more than 160 researchers from very different disciplines -- ranging from computer science to anthropology. Celebrating 10 years of the Web Science research initiative, the conference featured six keynotes, three panels, nine paper sessions, and several side-events.
You Shall Not Pass: Detecting Malicious Users at Registration Time.
In: Proceedings of the 1st International Workshop on Online Safety, Trust and Fraud Prevention, series OnSt '16, pages 2:1-2:6. ACM, New York, NY, USA, 2016.
Christian Kater and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: Proceedings of the 1st International Workshop on Online Safety, Trust and Fraud Prevention, series OnSt '16, pages 2:1-2:6. ACM, New York, NY, USA, 2016.
Christian Kater and Robert Jäschke.
[doi] [abstract] [BibTeX]
Spam is a widespread problem for many online services. The use case in this paper is the social bookmarking system BibSonomy, which received over 150 times more registrations from spam users than from normal users over the last ten years.
A common approach to fight spam is to use machine learning to classify the users into good or malicious users. Based on information the users provide to the service in form of profile information or posts, features are created from which a classifier can make its decision. However, this often means that the accounts of the spam users are already active and can post their spam. In this work we propose an approach for deciding at registration time whether a user is malicious or not. In order to achieve this goal, we extracted 177 features from the information the users provide during the registration process, their IP address, and registration time. With these features we used state-of-the-art classifiers to identify users as spammers or regular users. With the best classifier, we could reach an AUC of 0.912
Posted, Visited, Exported: Altmetrics in the Social Tagging System BibSonomy.
Journal of Informetrics, 10(3):732-749, 2016.
Daniel Zoller, Stephan Doerfel, Robert Jäschke, Gerd Stumme and Andreas Hotho.
[doi] [abstract] [BibTeX]
Journal of Informetrics, 10(3):732-749, 2016.
Daniel Zoller, Stephan Doerfel, Robert Jäschke, Gerd Stumme and Andreas Hotho.
[doi] [abstract] [BibTeX]
In social tagging systems, like Mendeley, CiteULike, and BibSonomy, users can post, tag, visit, or export scholarly publications. In this paper, we compare citations with metrics derived from users’ activities (altmetrics) in the popular social bookmarking system BibSonomy. Our analysis, using a corpus of more than 250,000 publications published before 2010, reveals that overall, citations and altmetrics in BibSonomy are mildly correlated. Furthermore, grouping publications by user-generated tags results in topic-homogeneous subsets that exhibit higher correlations with citations than the full corpus. We find that posts, exports, and visits of publications are correlated with citations and even bear predictive power over future impact. Machine learning classifiers predict whether the number of citations that a publication receives in a year exceeds the median number of citations in that year, based on the usage counts of the preceding year. In that setup, a Random Forest predictor outperforms the baseline on average by seven percentage points.
2015
Social Activity versus Academic Activity: A Case Study of Computer Scientists on Twitter.
In: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business, series i-KNOW '15. ACM, New York, NY, USA, 2015.
Subhash Chandra Pujari, Asmelash Teka Hadgu, Elisabeth Lex and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business, series i-KNOW '15. ACM, New York, NY, USA, 2015.
Subhash Chandra Pujari, Asmelash Teka Hadgu, Elisabeth Lex and Robert Jäschke.
[doi] [abstract] [BibTeX]
In this work, we study social and academic network activities of researchers from Computer Science. Using a recently proposed framework, we map the researchers to their Twitter accounts and link them to their publications. This enables us to create two types of networks: first, networks that reflect social activities on Twitter, namely the researchers’ follow, retweet and mention networks and second, networks that reflect academic activities, that is the co-authorship and citation networks. Based on these datasets, we (i) compare the social activities of researchers with their academic activities, (ii) investigate the consistency and similarity of communities within the social and academic activity networks, and (iii) investigate the information flow between different areas of Computer Science in and between both types of networks. Our findings show that if co-authors interact on Twitter, their relationship is reciprocal, increasing with the numbers of papers they co-authored. In general, the social and the academic activities are not correlated. In terms of community analysis, we found that the three social activity networks are most consistent with each other, with the highest consistency between the retweet and mention network. A study of information flow revealed that in the follow network, researchers from Data Management, Human-Computer Interaction, and Artificial Intelligence act as a source of information for other areas in Computer Science.
Semantic Annotation for Microblog Topics Using Wikipedia Temporal Information.
In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 97-106. Association for Computational Linguistics, 2015.
Tuan Tran, Nam-Khanh Tran, Asmelash Teka Hadgu and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 97-106. Association for Computational Linguistics, 2015.
Tuan Tran, Nam-Khanh Tran, Asmelash Teka Hadgu and Robert Jäschke.
[doi] [abstract] [BibTeX]
In this paper we study the problem of semantic annotation for a trending hashtag which is the crucial step towards analyzing user behavior in social media, yet has been largely unexplored. We tackle the problem via linking to entities from Wikipedia. We incorporate the social aspects of trending hashtags by identifying prominent entities for the annotation so as to maximize the information spreading in entity networks. We exploit temporal dynamics of entities in Wikipedia, namely Wikipedia edits and page views to improve the annotation quality. Our experiments show that we significantly outperform the established methods in tweet annotation.
On Publication Usage in a Social Bookmarking System.
In: Proceedings of the ACM Web Science Conference, series WebSci '15, pages 67:1-67:2. ACM, New York, NY, USA, 2015.
Daniel Zoller, Stephan Doerfel, Robert Jäschke, Gerd Stumme and Andreas Hotho.
[doi] [abstract] [BibTeX]
In: Proceedings of the ACM Web Science Conference, series WebSci '15, pages 67:1-67:2. ACM, New York, NY, USA, 2015.
Daniel Zoller, Stephan Doerfel, Robert Jäschke, Gerd Stumme and Andreas Hotho.
[doi] [abstract] [BibTeX]
Scholarly success is traditionally measured in terms of cita- tions to publications. With the advent of publication man- agement and digital libraries on the web, scholarly usage data has become a target of investigation and new impact metrics computed on such usage data have been proposed – so called altmetrics. In scholarly social bookmarking sys- tems, scientists collect and manage publication meta data and thus reveal their interest in these publications. In this work, we investigate connections between usage metrics and citations, and find posts, exports, and page views of publi- cations to be correlated to citations.
2014
Literatur recherchieren und verwalten.
In: CoScience - Gemeinsam forschen und publizieren mit dem Netz, chapter 1, pages 12-20. Technische Informationsbibliothek, Hannover, 2014.
Ina Blümel, Christian Hauschke and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: CoScience - Gemeinsam forschen und publizieren mit dem Netz, chapter 1, pages 12-20. Technische Informationsbibliothek, Hannover, 2014.
Ina Blümel, Christian Hauschke and Robert Jäschke.
[doi] [abstract] [BibTeX]
Ob in Forschungs- oder Publikationsprojekten - Recherche ist essentieller Bestandteil im Prozess des wissenschaftlichen Arbeitens, und das nicht nur am Anfang eines Projektes, sondern immer wieder und zu unterschiedlichen Projektmomenten. Wer forscht, möchte wissen, was schon geforscht wurde, welche Methoden für ein Projekt anwendbar sind, welche Begrifflichkeiten verwendet werden und welche inhaltlichen, formalen und methodischen Klippen es gegebenenfalls zu umschiffen gilt. Die Verwaltung der gefundenen Quellen ist Teil der Recherche und, unter anderem, eine wichtige Voraussetzung für korrektes Zitieren. Beim kollaborativen Arbeiten ist das Teilen der recherchierten Information wünschenswert, um den Wissenstand zu homogenisieren und Doppelarbeit zu vermeiden. In vernetzten Projekten besteht die Besonderheit darin, die Recherche so durchzuführen, dass das Ergebnis, also die gefundenen Informationen, allen Projektmitgliedern transparent ist.
The Quest for Research Information.
In: Proceedings of the 12th International Conference on Current Research Information Systems. 2014.
Ina Blümel, Stefan Dietze, Lambert Heller, Robert Jäschke and Martin Mehlberg.
[doi] [abstract] [BibTeX]
In: Proceedings of the 12th International Conference on Current Research Information Systems. 2014.
Ina Blümel, Stefan Dietze, Lambert Heller, Robert Jäschke and Martin Mehlberg.
[doi] [abstract] [BibTeX]
Research information, i.e., data about research projects, organisations, researchers or research outputs such as publications or patents, is spread across the web, usually residing on institutional and personal web pages or in semi-open databases and information systems. While there exists a wealth of unstructured information, the limited amounts of structured data often are exposed following proprietary or less-established schemas and interfaces. Therefore, a holistic view on research information across organisational and national boundaries is not feasible and information is inconsistent and incomplete. On the other hand, web crawling and information extraction techniques have matured throughout the last decade, allowing for automated approaches of harvesting, extracting and consolidating research information into a more coherent knowledge graph. In particular the Linked Data community has provided a range of techniques, schemas and vocabularies which allow to represent and interlink research information in a more coherent manner. In this work, we give an overview of the current state of the art in research information sharing on the web and present initial ideas towards a more holistic approach for boot-strapping research information from available web sources.
UMAP 2014 Extended Proceedings.
volume 1181. CEUR-WS, 2014.
Iván Cantador, Min Chi, Rosta Farzan and Robert Jäschke.
[doi] [abstract] [BibTeX]
volume 1181. CEUR-WS, 2014.
Iván Cantador, Min Chi, Rosta Farzan and Robert Jäschke.
[doi] [abstract] [BibTeX]
The workshops at the 22nd conference on User Modeling, Adaptation and Personalization cover broad and exciting topics related to ongoing research in the field. The workshops bring together researchers from a large number of academic institutions across the United States and Europe. Forty three papers at six workshops at UMAP 2014 highlight the impact of different factors such as human factors and emotions on user modeling. At the same time, the workshops attempt to discuss new challenges in the field such as news recommendation in the age of social media, student modeling in the context of MOOCs and gamified learning environments, and personalization in citizen-participatory e-government services and multilingual information systems.
Proceedings of the ECML PKDD Discovery Challenge 2013 - Recommending Given Names.
CEUR-WS.org. volume 1120. 2014.
Stephan Doerfel, Andreas Hotho, Robert Jäschke, Folke Mitzlaff and Juergen Mueller.
[doi] [BibTeX]
CEUR-WS.org. volume 1120. 2014.
Stephan Doerfel, Andreas Hotho, Robert Jäschke, Folke Mitzlaff and Juergen Mueller.
[doi] [BibTeX]
Identifying and Analyzing Researchers on Twitter.
In: Proceedings of the 2014 ACM Conference on Web Science, series WebSci '14, pages 23-30. ACM, New York, NY, USA, 2014.
Asmelash Teka Hadgu and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: Proceedings of the 2014 ACM Conference on Web Science, series WebSci '14, pages 23-30. ACM, New York, NY, USA, 2014.
Asmelash Teka Hadgu and Robert Jäschke.
[doi] [abstract] [BibTeX]
For millions of users Twitter is an important communication platform, a social network, and a system for resource sharing. Likewise, scientists use Twitter to connect with other researchers, announce calls for papers, or share their thoughts. Filtering tweets, discovering other researchers, or finding relevant information on a topic of interest, however, is difficult since no directory of researchers on Twitter exists.
In this paper we present an approach to identify Twitter accounts of researchers and demonstrate its utility for the discipline of computer science. Based on a seed set of computer science conferences we collect relevant Twitter users which we can partially map to ground-truth data. The mapping is leveraged to learn a model for classifying the remaining. To gain first insights into how researchers use Twitter, we empirically analyze the identified users and compare their age, popularity, influence, and social network.
Graph-Based Representation and Reasoning.
Lecture Notes in Computer Science. volume 8577. Springer, 2014.
Nathalie Hernandez, Robert Jäschke and Madalina Croitoru.
[doi] [abstract] [BibTeX]
Lecture Notes in Computer Science. volume 8577. Springer, 2014.
Nathalie Hernandez, Robert Jäschke and Madalina Croitoru.
[doi] [abstract] [BibTeX]
This book constitutes the proceedings of the 21st International Conference on Conceptual Structures, ICCS 2014, held in Iaşi, Romania, in July 2014. The 17 regular papers and 6 short papers presented in this volume were carefully reviewed and selected from 40 and 10 submissions, respectively. The topics covered are: conceptual structures, knowledge representation, reasoning, conceptual graphs, formal concept analysis, semantic Web, information integration, machine learning, data mining and information retrieval.
2013
An analysis of tag-recommender evaluation procedures.
In: Proceedings of the 7th ACM conference on Recommender systems, series RecSys '13, pages 343-346. ACM, New York, NY, USA, 2013.
Stephan Doerfel and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: Proceedings of the 7th ACM conference on Recommender systems, series RecSys '13, pages 343-346. ACM, New York, NY, USA, 2013.
Stephan Doerfel and Robert Jäschke.
[doi] [abstract] [BibTeX]
Since the rise of collaborative tagging systems on the web, the tag recommendation task -- suggesting suitable tags to users of such systems while they add resources to their collection -- has been tackled. However, the (offline) evaluation of tag recommendation algorithms usually suffers from difficulties like the sparseness of the data or the cold start problem for new resources or users. Previous studies therefore often used so-called post-cores (specific subsets of the original datasets) for their experiments. In this paper, we conduct a large-scale experiment in which we analyze different tag recommendation algorithms on different cores of three real-world datasets. We show, that a recommender's performance depends on the particular core and explore correlations between performances on different cores.
Attribute Exploration on the Web.
In: P. Cellier, F. Distel and B. Ganter, editors, Contributions to the 11th International Conference on Formal Concept Analysis, pages 19-34. 2013.
Robert Jäschke and Sebastian Rudolph.
[doi] [abstract] [BibTeX]
In: P. Cellier, F. Distel and B. Ganter, editors, Contributions to the 11th International Conference on Formal Concept Analysis, pages 19-34. 2013.
Robert Jäschke and Sebastian Rudolph.
[doi] [abstract] [BibTeX]
We propose an approach for supporting attribute exploration by web information retrieval, in particular by posing appropriate queries to search engines, crowd sourcing systems, and the linked open data cloud. We discuss underlying general assumptions for this to work and the degree to which these can be taken for granted.
Deeper Into the Folksonomy Graph: FolkRank Adaptations and Extensions for Improved Tag Recommendations.
cs.IR, 1310.1498, 2013.
Nikolas Landia, Stephan Doerfel, Robert Jäschke, Sarabjot Singh Anand, Andreas Hotho and Nathan Griffiths.
[doi] [abstract] [BibTeX]
cs.IR, 1310.1498, 2013.
Nikolas Landia, Stephan Doerfel, Robert Jäschke, Sarabjot Singh Anand, Andreas Hotho and Nathan Griffiths.
[doi] [abstract] [BibTeX]
The information contained in social tagging systems is often modelled as a graph of connections between users, items and tags. Recommendation algorithms such as FolkRank, have the potential to leverage complex relationships in the data, corresponding to multiple hops in the graph. We present an in-depth analysis and evaluation of graph models for social tagging data and propose novel adaptations and extensions of FolkRank to improve tag recommendations. We highlight implicit assumptions made by the widely used folksonomy model, and propose an alternative and more accurate graph-representation of the data. Our extensions of FolkRank address the new item problem by incorporating content data into the algorithm, and significantly improve prediction results on unpruned datasets. Our adaptations address issues in the iterative weight spreading calculation that potentially hinder FolkRank's ability to leverage the deep graph as an information source. Moreover, we evaluate the benefit of considering each deeper level of the graph, and present important insights regarding the characteristics of social tagging data in general. Our results suggest that the base assumption made by conventional weight propagation methods, that closeness in the graph always implies a positive relationship, does not hold for the social tagging domain.
2012
Recommender Systems for Social Tagging Systems.
2012.
L. Balby Marinho, A. Hotho, R. Jäschke, A. Nanopoulos, S. Rendle, L. Schmidt-Thieme, G. Stumme and P. Symeonidis.
[doi] [abstract] [BibTeX]
2012.
L. Balby Marinho, A. Hotho, R. Jäschke, A. Nanopoulos, S. Rendle, L. Schmidt-Thieme, G. Stumme and P. Symeonidis.
[doi] [abstract] [BibTeX]
Social Tagging Systems are web applications in which users upload resources (e.g., bookmarks, videos, photos, etc.) and annotate it with a list of freely chosen keywords called tags. This is a grassroots approach to organize a site and help users to find the resources they are interested in. Social tagging systems are open and inherently social; features that have been proven to encourage participation. However, with the large popularity of these systems and the increasing amount of user-contributed content, information overload rapidly becomes an issue. Recommender Systems are well known applications for increasing the level of relevant content over the “noise” that continuously grows as more and more content becomes available online. In social tagging systems, however, we face new challenges. While in classic recommender systems the mode of recommendation is basically the resource, in social tagging systems there are three possible modes of recommendation: users, resources, or tags. Therefore suitable methods that properly exploit the different dimensions of social tagging systems data are needed. In this book, we survey the most recent and state-of-the-art work about a whole new generation of recommender systems built to serve social tagging systems. The book is divided into self-contained chapters covering the background material on social tagging systems and recommender systems to the more advanced techniques like the ones based on tensor factorization and graph-based models.
Leveraging Publication Metadata and Social Data into FolkRank for Scientific Publication Recommendation .
In: Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, pages 9-16. ACM, New York, NY, USA, 2012.
Stephan Doerfel, Robert Jäschke, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, pages 9-16. ACM, New York, NY, USA, 2012.
Stephan Doerfel, Robert Jäschke, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
The ever-growing flood of new scientific articles requires novel retrieval mechanisms. One means for mitigating this instance of the information overload phenomenon are collaborative tagging systems, that allow users to select, share and annotate references to publications. These systems employ recommendation algorithms to present to their users personalized lists of interesting and relevant publications. In this paper we analyze different ways to incorporate social data and metadata from collaborative tagging systems into the graph-based ranking algorithm FolkRank to utilize it for recommending scientific articles to users of the social bookmarking system BibSonomy. We compare the results to those of Collaborative Filtering, which has previously been applied for resource recommendation.
Publication Analysis of the Formal Concept Analysis Community.
In: F. Domenach, D. Ignatov and J. Poelmans, editors, Formal Concept Analysis, volume 7278, series Lecture Notes in Artificial Intelligence, pages 77-95. Springer, Berlin/Heidelberg, 2012.
Stephan Doerfel, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: F. Domenach, D. Ignatov and J. Poelmans, editors, Formal Concept Analysis, volume 7278, series Lecture Notes in Artificial Intelligence, pages 77-95. Springer, Berlin/Heidelberg, 2012.
Stephan Doerfel, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
We present an analysis of the publication and citation networks of all previous editions of the three conferences most relevant to the FCA community: ICFCA, ICCS and CLA. Using data mining methods from FCA and graph analysis, we investigate patterns and communities among authors, we identify and visualize influential publications and authors, and we give a statistical summary of the conferences’ history.
Challenges in Tag Recommendations for Collaborative Tagging Systems.
In: J. J. Pazos Arias, A. Fernández Vilas and R. P. Díaz Redondo, editors, Recommender Systems for the Social Web, pages 65-87. Springer, Berlin/Heidelberg, 2012.
Robert Jäschke, Andreas Hotho, Folke Mitzlaff and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: J. J. Pazos Arias, A. Fernández Vilas and R. P. Díaz Redondo, editors, Recommender Systems for the Social Web, pages 65-87. Springer, Berlin/Heidelberg, 2012.
Robert Jäschke, Andreas Hotho, Folke Mitzlaff and Gerd Stumme.
[doi] [abstract] [BibTeX]
Originally introduced by social bookmarking systems, collaborative tagging, or social tagging, has been widely adopted by many web-based systems like wikis, e-commerce platforms, or social networks. Collaborative tagging systems allow users to annotate resources using freely chosen keywords, so called tags . Those tags help users in finding/retrieving resources, discovering new resources, and navigating through the system. The process of tagging resources is laborious. Therefore, most systems support their users by tag recommender components that recommend tags in a personalized way. The Discovery Challenges 2008 and 2009 of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) tackled the problem of tag recommendations in collaborative tagging systems. Researchers were invited to test their methods in a competition on datasets from the social bookmark and publication sharing system BibSonomy. Moreover, the 2009 challenge included an online task where the recommender systems were integrated into BibSonomy and provided recommendations in real time. In this chapter we review, evaluate and summarize the submissions to the two Discovery Challenges and thus lay the groundwork for continuing research in this area.
Extending FolkRank with Content Data.
In: Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, pages 1-8. ACM, New York, NY, USA, 2012.
Nikolas Landia, Sarabjot Singh Anand, Andreas Hotho, Robert Jäschke, Stephan Doerfel and Folke Mitzlaff.
[doi] [abstract] [BibTeX]
In: Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, pages 1-8. ACM, New York, NY, USA, 2012.
Nikolas Landia, Sarabjot Singh Anand, Andreas Hotho, Robert Jäschke, Stephan Doerfel and Folke Mitzlaff.
[doi] [abstract] [BibTeX]
Real-world tagging datasets have a large proportion of new/ untagged documents. Few approaches for recommending tags to a user for a document address this new item problem, concentrating instead on artificially created post-core datasets where it is guaranteed that the user as well as the document of each test post is known to the system and already has some tags assigned to it. In order to recommend tags for new documents, approaches are required which model documents not only based on the tags assigned to them in the past (if any), but also the content. In this paper we present a novel adaptation to the widely recognised FolkRank tag recommendation algorithm by including content data. We adapt the FolkRank graph to use word nodes instead of document nodes, enabling it to recommend tags for new documents based on their textual content. Our adaptations make FolkRank applicable to post-core 1 ie. the full real-world tagging datasets and address the new item problem in tag recommendation. For comparison, we also apply and evaluate the same methodology of including content on a simpler tag recommendation algorithm. This results in a less expensive recommender which suggests a combination of user related and document content related tags.
Including content data into FolkRank shows an improvement over plain FolkRank on full tagging datasets. However, we also observe that our simpler content-aware tag recommender outperforms FolkRank with content data. Our results suggest that an optimisation of the weighting method of FolkRank is required to achieve better results.
2011
Enhancing Social Interactions at Conferences.
Information Technology, 53(3):101-107, 2011.
Martin Atzmueller, Dominik Benz, Stephan Doerfel, Andreas Hotho, Robert Jäschke, Bjoern Elmar Macek, Folke Mitzlaff, Christoph Scholz and Gerd Stumme.
[doi] [abstract] [BibTeX]
Information Technology, 53(3):101-107, 2011.
Martin Atzmueller, Dominik Benz, Stephan Doerfel, Andreas Hotho, Robert Jäschke, Bjoern Elmar Macek, Folke Mitzlaff, Christoph Scholz and Gerd Stumme.
[doi] [abstract] [BibTeX]
Conferator is a novel social conference system that provides the management of social interactions and context information in ubiquitous and social environments. Using RFID and social networking technology, Conferator provides the means for effective management of personal contacts and according conference information before, during and after a conference. We describe the system in detail, before we analyze and discuss results of a typical application of the Conferator system.
Social Tagging Recommender Systems.
In: F. Ricci, L. Rokach, B. Shapira and P. B. Kantor, editors, Recommender Systems Handbook, pages 615-644. Springer, New York, 2011.
Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme, Robert Jäschke, Andreas Hotho, Gerd Stumme and Panagiotis Symeonidis.
[doi] [abstract] [BibTeX]
In: F. Ricci, L. Rokach, B. Shapira and P. B. Kantor, editors, Recommender Systems Handbook, pages 615-644. Springer, New York, 2011.
Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme, Robert Jäschke, Andreas Hotho, Gerd Stumme and Panagiotis Symeonidis.
[doi] [abstract] [BibTeX]
The new generation of Web applications known as (STS) is successfully established and poised for continued growth. STS are open and inherently social; features that have been proven to encourage participation. But while STS bring new opportunities, they revive old problems, such as information overload. Recommender Systems are well known applications for increasing the level of relevant content over the noise that continuously grows as more and more content becomes available online. In STS however, we face new challenges. Users are interested in finding not only content, but also tags and even other users. Moreover, while traditional recommender systems usually operate over 2-way data arrays, STS data is represented as a third-order tensor or a hypergraph with hyperedges denoting (user, resource, tag) triples. In this chapter, we survey the most recent and state-of-the-art work about a whole new generation of recommender systems built to serve STS.We describe (a) novel facets of recommenders for STS, such as user, resource, and tag recommenders, (b) new approaches and algorithms for dealing with the ternary nature of STS data, and (c) recommender systems deployed in real world STS. Moreover, a concise comparison between existing works is presented, through which we identify and point out new research directions.
Recommendation in the Social Web.
AI Magazine, 32(3):46-56, 2011.
Robin Burke, Jonathan Gemmell, Andreas Hotho and Robert Jäschke.
[doi] [abstract] [BibTeX]
AI Magazine, 32(3):46-56, 2011.
Robin Burke, Jonathan Gemmell, Andreas Hotho and Robert Jäschke.
[doi] [abstract] [BibTeX]
Recommender systems are a means of personalizing the presentation of information to ensure that users see the items most relevant to them. The social web has added new dimensions to the way people interact on the Internet, placing the emphasis on user-generated content. Users in social networks create photos, videos and other artifacts, collaborate with other users, socialize with their friends and share their opinions online. This outpouring of material has brought increased attention to recommender systems, as a means of managing this vast universe of content. At the same time, the diversity and complexity of the data has meant new challenges for researchers in recommendation. This article describes the nature of recommendation research in social web applications and provides some illustrative examples of current research directions and techniques. It is difficult to overstate the impact of the social web. This new breed of social applications is reshaping nearly every human activity from the way people watch movies to how they overthrow governments. Facebook allows its members to maintain friendships whether they live next door or on another continent. With Twitter, users from celebrities to ordinary folks can launch their 140 character messages out to a diverse horde of ‘‘followers.” Flickr and YouTube users upload their personal media to share with the world, while Wikipedia editors collaborate on the world’s largest encyclopedia.
A Comparison of Content-Based Tag Recommendations in Folksonomy Systems.
In: K. E. Wolff, D. E. Palchunov, N. G. Zagoruiko and U. Andelfinger, editors, Knowledge Processing and Data Analysis, volume 6581, series Lecture Notes in Computer Science, pages 136-149. Springer, Berlin/Heidelberg, 2011.
Jens Illig, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: K. E. Wolff, D. E. Palchunov, N. G. Zagoruiko and U. Andelfinger, editors, Knowledge Processing and Data Analysis, volume 6581, series Lecture Notes in Computer Science, pages 136-149. Springer, Berlin/Heidelberg, 2011.
Jens Illig, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
Recommendation algorithms and multi-class classifiers can support users of social bookmarking systems in assigning tags to their bookmarks. Content based recommenders are the usual approach for facing the cold start problem, i.e., when a bookmark is uploaded for the first time and no information from other users can be exploited. In this paper, we evaluate several recommendation algorithms in a cold-start scenario on a large real-world dataset.
Formal Concept Analysis and Tag Recommendations in Collaborative Tagging Systems.
2011.
Robert Jäschke.
[doi] [abstract] [BibTeX]
2011.
Robert Jäschke.
[doi] [abstract] [BibTeX]
One of the most noticeable innovation that emerged with the advent of the Web 2.0 and the focal point of this thesis are collaborative tagging systems. They allow users to annotate arbitrary resources with freely chosen keywords, so called tags. The tags are used for navigation, finding resources, and serendipitous browsing and thus provide an immediate benefit for the user. By now, several systems for tagging photos, web links, publication references, videos, etc. have attracted millions of users which in turn annotated countless resources. Tagging gained so much popularity that it spread into other applications like web browsers, software packet managers, and even file systems. Therefore, the relevance of the methods presented in this thesis goes beyond the Web 2.0.
The conceptual structure underlying collaborative tagging systems is called folksonomy. It can be represented as a tripartite hypergraph with user, tag, and resource nodes. Each edge of the graph expresses the fact that a user annotated a resource with a tag. This social network constitutes a lightweight conceptual structure that is not formalized, but rather implicit and thus needs to be extracted with knowledge discovery methods. In this thesis a new data mining task – the mining of all frequent tri-concepts – is presented, together with an efficient algorithm for discovering such implicit shared conceptualizations. Our approach extends the data mining task of discovering all closed itemsets to three-dimensional data structures to allow for mining folksonomies. Extending the theory of triadic Formal Concept Analysis, we provide a formal definition of the problem, and present an efficient algorithm for its solution. We show the applicability of our approach on three large real-world examples and thereby perform a conceptual clustering of two collaborative tagging systems. Finally, we introduce neighborhoods of triadic concepts as basis for a lightweight visualization of tri-lattices.
The social bookmark and publication sharing system BibSonomy, which is currently among the three most popular systems of its kind, has been developed by our research group. Besides being a useful tool for many scientists, it provides interested researchers a basis for the evaluation and integration of their knowledge discovery methods. This thesis introduces BibSonomy as an exemplary collaborative tagging system and gives an overview of its architecture and some of its features. Furthermore, BibSonomy is used as foundation for evaluating and integrating some of the discussed approaches.
Collaborative tagging systems usually include tag recommendation mechanisms easing the process of finding good tags for a resource, but also consolidating the tag vocabulary across users. In this thesis we evaluate and compare several recommendation algorithms on large-scale real-world datasets: an adaptation of user-based Collaborative Filtering, a graph-based recommender built on top of the FolkRank algorithm, and simple methods based on counting tag co-occurences. We show that both FolkRank and Collaborative Filtering provide better results than non-personalized baseline methods. Moreover, since methods based on counting tag co-occurrences are computationally cheap, and thus usually preferable for real time scenarios, we discuss simple approaches for improving the performance of such methods. We demonstrate how a simple recommender based on counting tags from users and resources can perform almost as good as the best recommender. Furthermore, we show how to integrate recommendation methods into a real tagging system, record and evaluate their performance by describing the tag recommendation framework we developed for BibSonomy. With the intention to develop, test, and evaluate recommendation algorithms and supporting cooperation with researchers, we designed the framework to be easily extensible, open for a variety of methods, and usable independent from BibSonomy. We also present an evaluation of the framework which demonstrates its power.
The folksonomy graph shows specific structural properties that explain its growth and the possibility of serendipitous exploration. Clicklogs of web search engines can be represented as a folksonomy in which queries are descriptions of clicked URLs. The resulting network structure, which we will term logsonomy is very similar to the one of folksonomies. In order to find out about its properties, we analyze the topological characteristics of the tripartite hypergraph of queries, users and bookmarks on a large folksonomy snapshot and on query logs of two large search engines. We find that all of the three datasets exhibit similar structural properties and thus conclude that the clicking behaviour of search engine users based on the displayed search results and the tagging behaviour of collaborative tagging users is driven by similar dynamics.
In this thesis we further transfer the folksonomy paradigm to the Social Semantic Desktop – a new model of computer desktop that uses Semantic Web technologies to better link information items. There we apply community support methods to the folksonomy found in the network of social semantic desktops. Thus, we connect knowledge discovery for folksonomies with semantic technologies.
Alltogether, the research in this thesis is centered around collaborative tagging systems and their underlying datastructure – folksonomies – and thereby paves the way for the further dissemination of this successful knowledge management paradigm.
Tagging data as implicit feedback for learning-to-rank.
In: Proceedings of the ACM WebSci Conference, pages 1-4. New York, NY, USA, 2011.
Beate Navarro Bullock, Robert Jäschke and Andreas Hotho.
[doi] [abstract] [BibTeX]
In: Proceedings of the ACM WebSci Conference, pages 1-4. New York, NY, USA, 2011.
Beate Navarro Bullock, Robert Jäschke and Andreas Hotho.
[doi] [abstract] [BibTeX]
Learning-to-rank methods automatically generate ranking functions which can be used for ordering unknown resources according to their relevance for a specific search query. The training data to construct such a model consists of features describing a document-query-pair as well as relevance scores indicating how important the document is for the query. In general, these relevance scores are derived by asking experts to manually assess search results or by exploiting user search behaviour such as click data. The human evaluation of ranking results gives explicit relevance scores, but it is expensive to obtain. Clickdata can be logged from the user interaction with a search engine, but the feedback is noisy. In this paper, we want to explore a novel source of implicit feedback for web search: tagging data. Creating relevance feedback from tagging data leads to a further source of implicit relevance feedback which helps improve the reliability of automatically generated relevance scores and therefore the quality of learning-to-rank models.
Formal Concept Analysis.
Lecture Notes in Artificial Intelligence. volume 6628. Springer, Berlin/Heidelberg, 2011.
Petko Valtchev and Robert Jäschke.
[doi] [abstract] [BibTeX]
Lecture Notes in Artificial Intelligence. volume 6628. Springer, Berlin/Heidelberg, 2011.
Petko Valtchev and Robert Jäschke.
[doi] [abstract] [BibTeX]
The present volume features a selection of the papers presented at the 9th International Conference on Formal Concept Analysis (ICFCA 2011). Over the years, the ICFCA conference series has grown into the premier forum for dissemination of research on topics from formal concept analysis (FCA) theory and applications, as well as from the related fields of lattices and partially ordered structures.
FCA is a multi-disciplinary field with strong roots in the mathematical theory of partial orders and lattices, with tools originating in computer science and artificial intelligence. FCA emerged in the early 1980s from efforts to restructure lattice theory to promote better communication between lattice theorists and potential users of lattice-based methods for data management. Initially, the central theme was the mathematical formalization of concept and conceptual hierarchy. Since then, the field has developed into a constantly growing research area in its own right with a thriving theoretical community and an increasing number of applications in data and knowledge processing including disciplines such as data visualization, information retrieval, machine learning, software engineering, data analysis, data mining, social networks analysis, etc.
ICFCA 2011 was held from May 2 to May 6, 2011, in Nicosia, Cyprus. The program committee received 49 high-quality submissions that were subjected to a highly competitive selection process. Each paper was reviewed by three referees (exceptionally two or four). After a first round, some papers got a definitive acceptance status, while others got accepted conditionally to improvements in their content. The latter got to a second round of reviewing. The overall outcome was the acceptance of 16 papers as regular ones for presentation at the conference and publication in this volume. Another seven papers have still been assessed as valuable for discussion at the conference and were therefore collected in the supplementary proceedings. The regular papers presented hereafter cover advances on a wide range of subjects from FCA and related fields.
A first group of papers tackled mathematical problems within the FCA field. A subset thereof focused on factor identification within the incidence relation or its lattice representation (papers by Glodeanu and by Krupka). The remainder of the group proposed characterizations of particular classes of ordered structures (papers by Doerfel and by Meschke et al.). A second group of papers addressed algorithmic problems from FCA and related fields. Two papers approached their problems from an algorithmic complexity viewpoint (papers by Distel and by Babin and Kuznetsov) while the final paper in this group addressed algorithmic problems for general lattices, i.e., not represented as formal contexts, with an FCA-based approach (work by Balcázar and Tîrnăucă). A third group studied alternative approaches for extending the expressive power of the core FCA, e.g., by generalizing the standard one-valued attributes to attributes valued in algebraic rings (work by González Calabozo et al.), by introducing pointer-like attributes, a.k.a. links (paper by Kötters), or by substituting set-shaped concept intents with modal logic expressions (paper by Soldano and Ventos). A fourth group focused on data mining-oriented aspects of FCA: agreement lattices in structured data mining (paper by Nedjar et al.), triadic association rule mining (work by Missaoui and Kwuida) and bi-clustering of numerical data (Kaytoue et al.). An addional paper shed some initial light on a key aspect of FCA-based data analysis and mining, i.e., the filtering of interesting concepts (paper by Belohlavek and Macko). Finally, a set of exciting applications of both basic and enhanced FCA frameworks to practical problems have beed described: in analysis of gene expression data (the already mentioned work by González Calabozo et al.), in web services composition (paper by Azmeh et al.) and in browsing and retrieval of structured data (work by Wray and Eklund). This volume also contains three keynote papers submitted by the invited speakers of the conference.
All these contributions constitute a volume of high quality which is the result of the hard work done by the authors, the invited speakers and the reviewers. We therefore wish to thank the members of the Program Committee and of the Editorial Board whose steady involvement and professionalism helped a lot. We would also like to acknowledge the participation of all the external reviewers who sent many valuable comments. Kudos also go to EasyChair for having made the reviewing/editing process a real pleasure. Special thanks go to the Cyprus Tourism Organisation for sponsoring the conference and to the University of Nicosia for hosting it. Finally we wish to thank the Conference Chair Florent Domenach and his colleagues from the Organization Committee for the mountains of energy they put behind the conference organization process right from the beginning in order to make it a total success. We would also like to express our gratitude towards Dr. Peristianis, President of the University of Nicosia, for his personal support.
2010
Academic Publication Management with PUMA - collect, organize and share publications.
In: M. Lalmas, J. Jose, A. Rauber, F. Sebastiani and I. Frommholz, editors, Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries (ECDL) 2010, volume 6273, series Lecture Notes in Computer Science, pages 417-420. Springer, Berlin/Heidelberg, 2010.
Dominik Benz, Andreas Hotho, Robert Jäschke, Gerd Stumme, Axel Halle, Angela Gerlach Sanches Lima, Helge Steenweg and Sven Stefani.
[doi] [abstract] [BibTeX]
In: M. Lalmas, J. Jose, A. Rauber, F. Sebastiani and I. Frommholz, editors, Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries (ECDL) 2010, volume 6273, series Lecture Notes in Computer Science, pages 417-420. Springer, Berlin/Heidelberg, 2010.
Dominik Benz, Andreas Hotho, Robert Jäschke, Gerd Stumme, Axel Halle, Angela Gerlach Sanches Lima, Helge Steenweg and Sven Stefani.
[doi] [abstract] [BibTeX]
The PUMA project fosters the Open Access movement und aims at a better support of the researcher’s publication work. PUMA stands for an integrated solution, where the upload of a publication results automatically in an update of both the personal and institutional homepage, the creation of an entry in a social bookmarking systems like BibSonomy, an entry in the academic reporting system of the university, and its publication in the institutional repository. In this poster, we present the main features of our solution.
Query Logs as Folksonomies.
Datenbank-Spektrum, 10(1):15-24, 2010.
Dominik Benz, Andreas Hotho, Robert Jäschke, Beate Krause and Gerd Stumme.
[doi] [abstract] [BibTeX]
Datenbank-Spektrum, 10(1):15-24, 2010.
Dominik Benz, Andreas Hotho, Robert Jäschke, Beate Krause and Gerd Stumme.
[doi] [abstract] [BibTeX]
Query logs provide a valuable resource for preference information in search. A user clicking on a specific resource after submitting a query indicates that the resource has some relevance with respect to the query. To leverage the information ofquery logs, one can relate submitted queries from specific users to their clicked resources and build a tripartite graph ofusers, resources and queries. This graph resembles the folksonomy structure of social bookmarking systems, where users addtags to resources. In this article, we summarize our work on building folksonomies from query log files. The focus is on threecomparative studies of the system’s content, structure and semantics. Our results show that query logs incorporate typicalfolksonomy properties and that approaches to leverage the inherent semantics of folksonomies can be applied to query logsas well.
The Social Bookmark and Publication Management System BibSonomy.
The VLDB Journal, 19(6):849-875, 2010.
Dominik Benz, Andreas Hotho, Robert Jäschke, Beate Krause, Folke Mitzlaff, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
The VLDB Journal, 19(6):849-875, 2010.
Dominik Benz, Andreas Hotho, Robert Jäschke, Beate Krause, Folke Mitzlaff, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
Social resource sharing systems are central elements of the Web 2.0 and use the same kind of lightweight knowledge representation, called folksonomy. Their large user communities and ever-growing networks of user-generated content have made them an attractive object of investigation for researchers from different disciplines like Social Network Analysis, Data Mining, Information Retrieval or Knowledge Discovery. In this paper, we summarize and extend our work on different aspects of this branch of Web 2.0 research, demonstrated and evaluated within our own social bookmark and publication sharing system BibSonomy, which is currently among the three most popular systems of its kind. We structure this presentation along the different interaction phases of a user with our system, coupling the relevant research questions of each phase with the corresponding implementation issues. This approach reveals in a systematic fashion important aspects and results of the broad bandwidth of folksonomy research like capturing of emergent semantics, spam detection, ranking algorithms, analogies to search engine log data, personalized tag recommendations and information extraction techniques. We conclude that when integrating a real-life application like BibSonomy into research, certain constraints have to be considered; but in general, the tight interplay between our scientific work and the running system has made BibSonomy a valuable platform for demonstrating and evaluating Web 2.0 research.
Publikationsmanagement mit BibSonomy - ein Social-Bookmarking-System für Wissenschaftler.
HMD - Praxis der Wirtschaftsinformatik, 271:47-58, 2010.
Andreas Hotho, Dominik Benz, Folke Eisterlehner, Robert Jäschke, Beate Krause, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
HMD - Praxis der Wirtschaftsinformatik, 271:47-58, 2010.
Andreas Hotho, Dominik Benz, Folke Eisterlehner, Robert Jäschke, Beate Krause, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
Kooperative Verschlagwortungs- bzw. Social-Bookmarking-Systeme wie Delicious, Mister Wong oder auch unser eigenes System BibSonomy erfreuen sich immer größerer Beliebtheit und bilden einen zentralen Bestandteil des heutigen Web 2.0. In solchen Systemen erstellen Nutzer leichtgewichtige Begriffssysteme, sogenannte Folksonomies, die die Nutzerdaten strukturieren. Die einfache Bedienbarkeit, die Allgegenwärtigkeit, die ständige Verfügbarkeit, aber auch die Möglichkeit, Gleichgesinnte spontan in solchen Systemen zu entdecken oder sie schlicht als Informationsquelle zu nutzen, sind Gründe für ihren gegenwärtigen Erfolg. Der Artikel führt den Begriff Social Bookmarking ein und diskutiert zentrale Elemente wie Browsing und Suche am Beispiel von BibSonomy anhand typischer Arbeitsabläufe eines Wissenschaftlers. Wir beschreiben die Architektur von BibSonomy sowie Wege der Integration und Vernetzung von BibSonomy mit Content-Management-Systemen und Webauftritten. Der Artikel schließt mit Querbezügen zu aktuellen Forschungsfragen im Bereich Social Bookmarking.
2009
Managing publications and bookmarks with BibSonomy.
In: C. Cattuto, G. Ruffo and F. Menczer, editors, HT '09: Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, pages 323-324. ACM, New York, NY, USA, 2009.
Dominik Benz, Folke Eisterlehner, Andreas Hotho, Robert Jäschke, Beate Krause and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: C. Cattuto, G. Ruffo and F. Menczer, editors, HT '09: Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, pages 323-324. ACM, New York, NY, USA, 2009.
Dominik Benz, Folke Eisterlehner, Andreas Hotho, Robert Jäschke, Beate Krause and Gerd Stumme.
[doi] [abstract] [BibTeX]
In this demo we present BibSonomy, a social bookmark and publication sharing system.
ECML PKDD Discovery Challenge 2009 (DC09).
CEUR-WS.org. volume 497. 2009.
Folke Eisterlehner, Andreas Hotho and Robert Jäschke.
[doi] [BibTeX]
CEUR-WS.org. volume 497. 2009.
Folke Eisterlehner, Andreas Hotho and Robert Jäschke.
[doi] [BibTeX]
Social Bookmarking am Beispiel BibSonomy.
In: A. Blumauer and T. Pellegrini, editors, Social Semantic Web, chapter 18, pages 363-391. Springer, Berlin, Heidelberg, 2009.
Andreas Hotho, Robert Jäschke, Dominik Benz, Miranda Grahl, Beate Krause, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: A. Blumauer and T. Pellegrini, editors, Social Semantic Web, chapter 18, pages 363-391. Springer, Berlin, Heidelberg, 2009.
Andreas Hotho, Robert Jäschke, Dominik Benz, Miranda Grahl, Beate Krause, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
BibSonomy ist ein kooperatives Verschlagwortungssystem (Social Bookmarking System), betrieben vom Fachgebiet Wissensverarbeitung
der Universität Kassel. Es erlaubt das Speichern und Organisieren von Web-Lesezeichen und Metadaten für wissenschaftlichePublikationen. In diesem Beitrag beschreiben wir die von BibSonomy bereitgestellte Funktionalität, die dahinter stehende Architektursowie das zugrunde liegende Datenmodell. Ferner erläutern wir Anwendungsbeispiele und gehen auf Methoden zur Analyse der in BibSonomy und ähnlichen Systemen enthaltenen Daten ein.
Testing and Evaluating Tag Recommenders in a Live System.
In: RecSys '09: Proceedings of the third ACM Conference on Recommender Systems, pages 369-372. ACM, New York, NY, USA, 2009.
Robert Jäschke, Folke Eisterlehner, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: RecSys '09: Proceedings of the third ACM Conference on Recommender Systems, pages 369-372. ACM, New York, NY, USA, 2009.
Robert Jäschke, Folke Eisterlehner, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
The challenge to provide tag recommendations for collaborative tagging systems has attracted quite some attention of researchers lately. However, most research focused on the evaluation and development of appropriate methods rather than tackling the practical challenges of how to integrate recommendation methods into real tagging systems, record and evaluate their performance.
In this paper we describe the tag recommendation framework we developed for our social bookmark and publication sharing system BibSonomy. With the intention to develop, test, and evaluate recommendation algorithms and supporting cooperation with researchers, we designed the framework to be easily extensible, open for a variety of methods, and usable independent from BibSonomy. Furthermore, this paper presents a first evaluation of two exemplarily deployed recommendation methods.
Testing and Evaluating Tag Recommenders in a Live System.
In: D. Benz and F. Janssen, editors, Workshop on Knowledge Discovery, Data Mining, and Machine Learning, pages 44-51. 2009.
Robert Jäschke, Folke Eisterlehner, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: D. Benz and F. Janssen, editors, Workshop on Knowledge Discovery, Data Mining, and Machine Learning, pages 44-51. 2009.
Robert Jäschke, Folke Eisterlehner, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
The challenge to provide tag recommendations for collaborative tagging systems has attracted quite some attention of researchers lately. However, most research focused on evaluation and development of appropriate methods rather than tackling the practical challenges of how to integrate recommendation methods into real tagging systems, record and evaluate their performance.
In this paper we describe the tag recommendation framework we developed for our social bookmark and publication sharing system BibSonomy. With the intention to develop, test, and evaluate recommendation algorithms and supporting cooperation with researchers, we designed the framework to be easily extensible, open for a variety of methods, and usable independent from BibSonomy. Furthermore, this paper presents an evaluation of two exemplarily deployed recommendation methods, demonstrating the power of the framework.
Mapping Bibliographic Records with Bibliographic Hash Keys.
In: R. Kuhlen, editor, Information: Droge, Ware oder Commons?, series Proceedings of the ISI. Verlag Werner Hülsbusch, 2009.
Jakob Voss, Andreas Hotho and Robert Jäschke.
[doi] [abstract] [BibTeX]
In: R. Kuhlen, editor, Information: Droge, Ware oder Commons?, series Proceedings of the ISI. Verlag Werner Hülsbusch, 2009.
Jakob Voss, Andreas Hotho and Robert Jäschke.
[doi] [abstract] [BibTeX]
This poster presents a set of hash keys for bibliographic records called bibkeys. Unlike other methods of duplicate detection, bibkeys can directly be calculated from a set of basic metadata fields (title, authors/editors, year). It is shown how bibkeys are used to map similar bibliographic records in BibSonomy and among distributed library catalogs and other distributed databases.
2008
Analyzing Tag Semantics Across Collaborative Tagging Systems.
In: H. Alani, S. Staab and G. Stumme, editors, Social Web Communities, series Dagstuhl Seminar Proceedings. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 2008.
Dominik Benz, Marko Grobelnik, Andreas Hotho, Robert Jäschke, Dunja Mladenic, Vito D. P. Servedio, Sergej Sizov and Martin Szomszor.
[doi] [abstract] [BibTeX]
In: H. Alani, S. Staab and G. Stumme, editors, Social Web Communities, series Dagstuhl Seminar Proceedings. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 2008.
Dominik Benz, Marko Grobelnik, Andreas Hotho, Robert Jäschke, Dunja Mladenic, Vito D. P. Servedio, Sergej Sizov and Martin Szomszor.
[doi] [abstract] [BibTeX]
The objective of our group was to exploit state-of-the-art Information Retrieval methods for finding associations and dependencies between tags, capturing and representing differences in tagging behavior and vocabulary of various folksonomies, with the overall aim to better understand the semantics of tags and the tagging process. Therefore we analyze the semantic content of tags in the Flickr and Delicious folksonomies. We find that: tag context similarity leads to meaningful results in Flickr, despite its narrow folksonomy character; the comparison of tags across Flickr and Delicious shows little semantic overlap, being tags in Flickr associated more to visual aspects rather than technological as it seems to be in Delicious; there are regions in the tag-tag space, provided with the cosine similarity metric, that are characterized by high density; the order of tags inside a post has a semantic relevance.
Discovering Shared Conceptualizations in Folksonomies.
Journal of Web Semantics, 6(1):38-53, 2008.
Robert Jäschke, Andreas Hotho, Christoph Schmitz, Bernhard Ganter and Gerd Stumme.
[doi] [abstract] [BibTeX]
Journal of Web Semantics, 6(1):38-53, 2008.
Robert Jäschke, Andreas Hotho, Christoph Schmitz, Bernhard Ganter and Gerd Stumme.
[doi] [abstract] [BibTeX]
Social bookmarking tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. Unlike ontologies, shared conceptualizations are not formalized, but rather implicit. We present a new data mining task, the mining of all frequent tri-concepts, together with an efficient algorithm, for discovering these implicit shared conceptualizations. Our approach extends the data mining task of discovering all closed itemsets to three-dimensional data structures to allow for mining folksonomies. We provide a formal definition of the problem, and present an efficient algorithm for its solution. Finally, we show the applicability of our approach on three large real-world examples.
Logsonomy - A Search Engine Folksonomy.
In: Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM 2008), pages 192-193. AAAI Press, Menlo Park, CA, USA, 2008.
Robert Jäschke, Beate Krause, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM 2008), pages 192-193. AAAI Press, Menlo Park, CA, USA, 2008.
Robert Jäschke, Beate Krause, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
In social bookmarking systems users describe bookmarks by keywords called tags. The structure behind these social systems, called folksonomies, can be viewed as a tripartite hypergraph of user, tag and resource nodes. This underlying network shows specific structural properties that explain its growth and the possibility of serendipitous exploration.
Search engines filter the vast information of the web. Queries describe a user’s information need. In response to the displayed results of the search engine, users click on the links of the result page as they expect the answer to be of relevance. The clickdata can be represented as a folksonomy in which queries are descriptions of clicked URLs. This poster analyzes the topological characteristics of the resulting tripartite hypergraph of queries, users and bookmarks of two query logs and compares it two a snapshot of the folksonomy del.icio.us.
Tag Recommendations in Social Bookmarking Systems.
AI Communications, 21(4):231-247, 2008.
Robert Jäschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme and Gerd Stumme.
[doi] [abstract] [BibTeX]
AI Communications, 21(4):231-247, 2008.
Robert Jäschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme and Gerd Stumme.
[doi] [abstract] [BibTeX]
Collaborative tagging systems allow users to assign keywords - so called "tags" - to resources. Tags are used for navigation, finding resources and serendipitous browsing and thus provide an immediate benefit for users. These systems usually include tag recommendation mechanisms easing the process of finding good tags for a resource, but also consolidating the tag vocabulary across users. In practice, however, only very basic recommendation strategies are applied.
In this paper we evaluate and compare several recommendation algorithms on large-scale real life datasets: an adaptation of user-based collaborative filtering, a graph-based recommender built on top of the FolkRank algorithm, and simple methods based on counting tag occurences. We show that both FolkRank and Collaborative Filtering provide better results than non-personalized baseline methods. Moreover, since methods based on counting tag occurrences are computationally cheap, and thus usually preferable for real time scenarios, we discuss simple approaches for improving the performance of such methods. We show, how a simple recommender based on counting tags from users and resources can perform almost as good as the best recommender.
Logsonomy - Social Information Retrieval with Logdata.
In: HT '08: Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, pages 157-166. ACM, New York, NY, USA, 2008.
Beate Krause, Robert Jäschke, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: HT '08: Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, pages 157-166. ACM, New York, NY, USA, 2008.
Beate Krause, Robert Jäschke, Andreas Hotho and Gerd Stumme.
[doi] [abstract] [BibTeX]
Social bookmarking systems constitute an established part of the Web 2.0. In such systems users describe bookmarks by keywords called tags. The structure behind these social systems, called folksonomies, can be viewed as a tripartite hypergraph of user, tag and resource nodes. This underlying network shows specific structural properties that explain its growth and the possibility of serendipitous exploration.
Today’s search engines represent the gateway to retrieve information from the World Wide Web. Short queries typically consisting of two to three words describe a user’s information need. In response to the displayed results of the search engine, users click on the links of the result page as they expect the answer to be of relevance.
This clickdata can be represented as a folksonomy in which queries are descriptions of clicked URLs. The resulting network structure, which we will term logsonomy is very similar to the one of folksonomies. In order to find out about its properties, we analyze the topological characteristics of the tripartite hypergraph of queries, users and bookmarks on a large snapshot of del.icio.us and on query logs of two large search engines. All of the three datasets show small world properties. The tagging behavior of users, which is explained by preferential attachment of the tags in social bookmark systems, is reflected in the distribution of single query words in search engines. We can conclude that the clicking behaviour of search engine users based on the displayed search results and the tagging behaviour of social bookmarking users is driven by similar dynamics.
2007
Analysis of the Publication Sharing Behaviour in BibSonomy.
In: U. Priss, S. Polovina and R. Hill, editors, Proceedings of the 15th International Conference on Conceptual Structures (ICCS 2007), volume 4604, series Lecture Notes in Artificial Intelligence, pages 283-295. Springer-Verlag, Berlin/Heidelberg, 2007.
Robert Jäschke, Andreas Hotho, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: U. Priss, S. Polovina and R. Hill, editors, Proceedings of the 15th International Conference on Conceptual Structures (ICCS 2007), volume 4604, series Lecture Notes in Artificial Intelligence, pages 283-295. Springer-Verlag, Berlin/Heidelberg, 2007.
Robert Jäschke, Andreas Hotho, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
BibSonomy is a web-based social resource sharing system which allows users to organise and share bookmarks and publications in a collaborative manner. In this paper we present the system, followed by a description of the insights in the structure of its bibliographic data that we gained by applying techniques we developed in the area of Formal Concept Analysis.
Organizing Publications and Bookmarks in BibSonomy.
In: H. Alani, N. Noy, G. Stumme, P. Mika, Y. Sure and D. Vrandecic, editors, Workshop on Social and Collaborative Construction of Structured Knowledge (CKC 2007) at WWW 2007. Banff, Canada, 2007.
Robert Jäschke, Miranda Grahl, Andreas Hotho, Beate Krause, Christoph Schmitz and Gerd Stumme.
[doi] [BibTeX]
In: H. Alani, N. Noy, G. Stumme, P. Mika, Y. Sure and D. Vrandecic, editors, Workshop on Social and Collaborative Construction of Structured Knowledge (CKC 2007) at WWW 2007. Banff, Canada, 2007.
Robert Jäschke, Miranda Grahl, Andreas Hotho, Beate Krause, Christoph Schmitz and Gerd Stumme.
[doi] [BibTeX]
Tag Recommendations in Folksonomies.
In: A. Hinneburg, editor, Workshop Proceedings of Lernen - Wissensentdeckung - Adaptivität (LWA 2007), pages 13-20. Martin-Luther-Universität Halle-Wittenberg, 2007.
Robert Jäschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: A. Hinneburg, editor, Workshop Proceedings of Lernen - Wissensentdeckung - Adaptivität (LWA 2007), pages 13-20. Martin-Luther-Universität Halle-Wittenberg, 2007.
Robert Jäschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme and Gerd Stumme.
[doi] [abstract] [BibTeX]
Collaborative tagging systems allow users to assign keywords—so called “tags”—to resources. Tags are used for navigation, finding resources and serendipitous browsing and thus provide an immediate benefit for users. These systems usually include tag recommendation mechanisms easing the process of finding good tags for a resource, but also consolidating the tag vocabulary across users. In practice, however, only very basic recommendation strategies are applied.
In this paper we present two tag recommendation algorithms: an adaptation of user-based collaborative filtering and a graph-based recommender built on top of FolkRank, an adaptation of the well-known PageRank algorithm that can cope with undirected triadic hyperedges. We evaluate and compare both algorithms on large-scale real life datasets and show that both provide better results than non-personalized baseline methods. Especially the graph-based recommender outperforms existing methods considerably.
Tag Recommendations in Folksonomies.
In: J. N. Kok, J. Koronacki, R. L. de Mántaras, S. Matwin, D. Mladenic and A. Skowron, editors, Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 4702, series Lecture Notes in Computer Science, pages 506-514. Springer, Berlin, Heidelberg, 2007.
Robert Jäschke, Leandro Balby Marinho, Andreas Hotho, Lars Schmidt-Thieme and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: J. N. Kok, J. Koronacki, R. L. de Mántaras, S. Matwin, D. Mladenic and A. Skowron, editors, Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 4702, series Lecture Notes in Computer Science, pages 506-514. Springer, Berlin, Heidelberg, 2007.
Robert Jäschke, Leandro Balby Marinho, Andreas Hotho, Lars Schmidt-Thieme and Gerd Stumme.
[doi] [abstract] [BibTeX]
Collaborative tagging systems allow users to assign keywords—so called “tags”—to resources. Tags are used for navigation, finding resources and serendipitous browsing and thus provide an immediate benefit for users. These systems usually include tag recommendation mechanisms easing the process of finding good tags for a resource, but also consolidating the tag vocabulary across users. In practice, however, only very basic recommendation strategies are applied.
In this paper we evaluate and compare two recommendation algorithms on largescale real life datasets: an adaptation of user-based collaborative filtering and a graph-based recommender built on top of FolkRank. We show that both provide better results than non-personalized baseline methods. Especially the graph-based recommender outperforms existing methods considerably.
2006
Semantic Network Analysis of Ontologies.
In: Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011, series Lecture Notes in Computer Science, pages 514-529. Springer, Berlin/Heidelberg, 2006. 10.1007/11762256_38
Bettina Hoser, Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011, series Lecture Notes in Computer Science, pages 514-529. Springer, Berlin/Heidelberg, 2006. 10.1007/11762256_38
Bettina Hoser, Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
A key argument for modeling knowledge in ontologies is the easy reuse and re-engineering of the knowledge. However, current ontology engineering tools provide only basic functionalities for analyzing ontologies. Since ontologies can be considered as graphs, graph analysis techniques are a suitable answer for this need. Graph analysis has been performed by sociologists for over 60 years, and resulted in the vivid research area of Social Network Analysis (SNA).While social network structures currently receive high attention in the Semantic Web community, there are only very few SNA applications, and virtually none for analyzing the structure of ontologies.
We illustrate the benefits of applying SNA to ontologies and the Semantic Web, and discuss which research topics arise on the edge between the two areas. In particular, we discuss how different notions of centrality describe the core content and structure of an ontology. From the rather simple notion of degree centrality over betweenness centrality to the more complex eigenvector centrality, we illustrate the insights these measures provide on two ontologies, which are different in purpose, scope, and size.
BibSonomy: A Social Bookmark and Publication Sharing System.
In: A. de Moor, S. Polovina and H. Delugach, editors, Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures. Aalborg University Press, Aalborg, Denmark, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [BibTeX]
In: A. de Moor, S. Polovina and H. Delugach, editors, Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures. Aalborg University Press, Aalborg, Denmark, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [BibTeX]
Emergent Semantics in BibSonomy.
In: C. Hochberger and R. Liskowsky, editors, Informatik 2006 - Informatik für Menschen, volume 94, series Lecture Notes in Informatics, pages 305-312. Gesellschaft für Informatik, Bonn, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: C. Hochberger and R. Liskowsky, editors, Informatik 2006 - Informatik für Menschen, volume 94, series Lecture Notes in Informatics, pages 305-312. Gesellschaft für Informatik, Bonn, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. In this paper we specify a formal model for folksonomies, briefly describe our own system BibSonomy, which allows for sharing both bookmarks and publication references, and discuss first steps towards emergent semantics.
Information Retrieval in Folksonomies: Search and Ranking.
In: Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011, series Lecture Notes in Computer Science, pages 411-426. Springer, Berlin/Heidelberg, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011, series Lecture Notes in Computer Science, pages 411-426. Springer, Berlin/Heidelberg, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. At the moment, however, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset.
Trend Detection in Folksonomies.
In: Y. S. Avrithis, Y. Kompatsiaris, S. Staab and N. E. O'Connor, editors, Proc. First International Conference on Semantics And Digital Media Technology (SAMT) , volume 4306, series Lecture Notes in Computer Science, pages 56-70. Springer, Heidelberg, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: Y. S. Avrithis, Y. Kompatsiaris, S. Staab and N. E. O'Connor, editors, Proc. First International Conference on Semantics And Digital Media Technology (SAMT) , volume 4306, series Lecture Notes in Computer Science, pages 56-70. Springer, Heidelberg, 2006.
Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
As the number of resources on the web exceeds by far the number of documents one can track, it becomes increasingly difficult to remain up to date on ones own areas of interest. The problem becomes more severe with the increasing fraction of multimedia data, from which it is difficult to extract some conceptual description of their contents.
One way to overcome this problem are social bookmark tools, which are rapidly emerging on the web. In such systems, users are setting up lightweight conceptual structures called folksonomies, and overcome thus the knowledge acquisition bottleneck. As more and more people participate in the effort, the use of a common vocabulary becomes more and more stable. We present an approach for discovering topic-specific trends within folksonomies. It is based on a differential adaptation of the PageRank algorithm to the triadic hypergraph structure of a folksonomy. The approach allows for any kind of data, as it does not rely on the internal structure of the documents. In particular, this allows to consider different data types in the same analysis step. We run experiments on a large-scale real-world snapshot of a social bookmarking system.
TRIAS - An Algorithm for Mining Iceberg Tri-Lattices.
In: ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 907-911. IEEE Computer Society, Washington, DC, USA, 2006.
Robert Jäschke, Andreas Hotho, Christoph Schmitz, Bernhard Ganter and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 907-911. IEEE Computer Society, Washington, DC, USA, 2006.
Robert Jäschke, Andreas Hotho, Christoph Schmitz, Bernhard Ganter and Gerd Stumme.
[doi] [abstract] [BibTeX]
In this paper, we present the foundations for mining frequent tri-concepts, which extend the notion of closed itemsets to three-dimensional data to allow for mining folk-sonomies. We provide a formal definition of the problem, and present an efficient algorithm for its solution as well as experimental results on a large real-world example.
Wege zur Entdeckung von Communities in Folksonomies.
In: S. Braß and A. Hinneburg, editors, Proc. 18. Workshop Grundlagen von Datenbanken, pages 80-84. Martin-Luther-Universität , Halle-Wittenberg, 2006.
Robert Jäschke, Andreas Hotho, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: S. Braß and A. Hinneburg, editors, Proc. 18. Workshop Grundlagen von Datenbanken, pages 80-84. Martin-Luther-Universität , Halle-Wittenberg, 2006.
Robert Jäschke, Andreas Hotho, Christoph Schmitz and Gerd Stumme.
[doi] [abstract] [BibTeX]
Ein wichtiger Baustein des neu entdeckten World Wide Web -- des "Web 2.0" -- stellen Folksonomies dar. In diesen Systemen können Benutzer gemeinsam Ressourcen verwalten und mit Schlagwörtern versehen. Die dadurch entstehenden begrifflichen Strukturen stellen ein interessantes Forschungsfeld dar. Dieser Artikel untersucht Ansätze und Wege zur Entdeckung und Strukturierung von Nutzergruppen ("Communities") in Folksonomies.
Content Aggregation on Knowledge Bases using Graph Clustering.
In: Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011, series Lecture Notes in Computer Science, pages 530-544. Springer, Berlin/Heidelberg, 2006.
Christoph Schmitz, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011, series Lecture Notes in Computer Science, pages 530-544. Springer, Berlin/Heidelberg, 2006.
Christoph Schmitz, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
Recently, research projects such as PADLR and SWAP have developed tools like Edutella or Bibster, which are targeted at establishing peer-to-peer knowledge management (P2PKM) systems. In such a system, it is necessary to obtain provide brief semantic descriptions of peers, so that routing algorithms or matchmaking processes can make decisions about which communities peers should belong to, or to which peers a given query should be forwarded.
This paper provides a graph clustering technique on knowledge bases for that purpose. Using this clustering, we can show that our strategy requires up to 58% fewer queries than the baselines to yield full recall in a bibliographic P2PKM scenario.
Kollaboratives Wissensmanagement.
2006.
Christoph Schmitz, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
2006.
Christoph Schmitz, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
Wissensmanagement in zentralisierten Wissensbasen erfordert einen hohen Aufwand für Erstellung und Wartung, und es entspricht nicht immer den Anforderungen der Benutzer. Wir geben in diesem Kapitel einen Überblick über zwei aktuelle Ansätze, die durch kollaboratives Wissensmanagement diese Probleme lösen können. Im Peer-to-Peer-Wissensmanagement unterhalten Benutzer dezentrale Wissensbasen, die dann vernetzt werden können, um andere Benutzer eigene Inhalte nutzen zu lassen. Folksonomies versprechen, die Wissensakquisition so einfach wie möglich zu gestalten und so viele Benutzer in den Aufbau und die Pflege einer gemeinsamen Wissensbasis einzubeziehen.
Mining Association Rules in Folksonomies.
In: V. Batagelj, H.-H. Bock, A. Ferligoj and A. Žiberna, editors, Data Science and Classification, series Studies in Classification, Data Analysis, and Knowledge Organization, pages 261-270. Springer, Berlin/Heidelberg, 2006.
Christoph Schmitz, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
In: V. Batagelj, H.-H. Bock, A. Ferligoj and A. Žiberna, editors, Data Science and Classification, series Studies in Classification, Data Analysis, and Knowledge Organization, pages 261-270. Springer, Berlin/Heidelberg, 2006.
Christoph Schmitz, Andreas Hotho, Robert Jäschke and Gerd Stumme.
[doi] [abstract] [BibTeX]
Social bookmark tools are rapidly emerging on the Web. In such
systems users are setting up lightweight conceptual structures
called folksonomies. These systems provide currently relatively few
structure. We discuss in this paper, how association rule mining
can be adopted to analyze and structure folksonomies, and how the results can be used
for ontology learning and supporting emergent semantics. We
demonstrate our approach on a large scale dataset stemming from an
online system.