Michael Röder
Leipzig University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Röder.
international world wide web conferences | 2015
Michael Röder; Axel-Cyrille Ngonga Ngomo; Ciro Baron; Andreas Both; Martin Brümmer; Diego Ceccarelli; Marco Cornolti; Didier Cherix; Bernd Eickmann; Paolo Ferragina; Christiane Lemke; Andrea Moro; Roberto Navigli; Francesco Piccinno; Giuseppe Rizzo; Harald Sack; René Speck; Raphaël Troncy; Jörg Waitelonis; Lars Wesemann
We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.
international semantic web conference | 2014
Axel-Cyrille Ngonga Ngomo; Michael Röder; Daniel Gerber; Sandro Athaide Coelho; Sören Auer; Andreas Both
Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Web of Data yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of named entities. While several approaches aim to tackle this problem, they still achieve poor accuracy. We address this drawback by presenting AGDISTIS, a novel knowledge-base-agnostic approach for named entity disambiguation. Our approach combines the Hypertext-Induced Topic Search (HITS) algorithm with label expansion strategies and string similarity measures. Based on this combination, AGDISTIS can efficiently detect the correct URIs for a given set of named entities within an input text. We evaluate our approach on eight different datasets against state-of-the-art named entity disambiguation frameworks. Our results indicate that we outperform the state-of-the-art approach by up to 29% F-measure.
web search and data mining | 2015
Michael Röder; Andreas Both; Alexander Hinneburg
Quantifying the coherence of a set of statements is a long standing problem with many potential applications that has attracted researchers from different sciences. The special case of measuring coherence of topics has been recently studied to remedy the problem that topic models give no guaranty on the interpretablity of their output. Several benchmark datasets were produced that record human judgements of the interpretability of topics. We are the first to propose a framework that allows to construct existing word based coherence measures as well as new ones by combining elementary components. We conduct a systematic search of the space of coherence measures using all publicly available topic relevance data for the evaluation. Our results show that new combinations of components outperform existing measures with respect to correlation to human ratings. nFinally, we outline how our results can be transferred to further applications in the context of text mining, information retrieval and the world wide web.
european conference on artificial intelligence | 2014
Axel-Cyrille Ngonga Ngomo; Michael Röder; Daniel Gerber; Sandro Athaide Coelho; Sören Auer; Andreas Both
Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Data Web yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of named entities. We address this issue by presenting AGDISTIS, a novel knowledge-base-agnostic approach for named entity disambiguation. Our approach combines the Hypertext-Induced Topic Search (HITS) algorithm with label expansion strategies and string similarity measures. Based on this combination, AGDISTIS can efficiently detect the correct URIs for a given set of named entities within an input text.
Semantic Web Evaluation Challenges | 2015
Michael Röder; René Speck; Axel-Cyrille Ngonga Ngomo
The concurrent growth of the Document Web and the Data Web demands accurate information extraction tools to bridge the gap between the two. In particular, the extraction of knowledge on real-world entities is indispensable to populate knowledge bases on the Web of Data. Here, we focus on the recognition of types for entities to populate knowledge bases and enable subsequent knowledge extraction steps. We present CETUS, a baseline approach to entity type extraction. CETUS is based on a three-step pipeline comprising (i) offline, knowledge-driven type pattern extraction from natural-language corpora based on grammar-rules, (ii) an analysis of input text to extract types and (iii) the mapping of the extracted type evidence to a subset of the DOLCE+DnS Ultra Lite ontology classes. We implement and compare two approaches for the third step using the YAGO ontology as well as the FOX entity recognition tool.
european semantic web conference | 2017
Kunal Jha; Michael Röder; Axel-Cyrille Ngonga Ngomo
The evaluation of Named Entity Recognition as well as Entity Linking systems is mostly based on manually created gold standards. However, the current gold standards have three main drawbacks. First, they do not share a common set of rules pertaining to what is to be marked and linked as an entity. Moreover, most of the gold standards have not been checked by other researchers after they were published. Hence, they commonly contain mistakes. Finally, many gold standards lack actuality as in most cases the reference knowledge bases used to link entities are refined over time while the gold standards are typically not updated to the newest version of the reference knowledge base. In this work, we analyze existing gold standards and derive a set of rules for annotating documents for named entity recognition and entity linking. We derive Eaglet, a tool that supports the semi-automatic checking of a gold standard based on these rules. A manual evaluation of Eaglet’s results shows that it achieves an accuracy of up to 88% when detecting errors. We apply Eaglet to 13 English gold standards and detect 38,453 errors. An evaluation of 10 tools on a subset of these datasets shows a performance difference of up to 10% micro F-measure on average.
international semantic web conference | 2016
Michael Röder; Axel-Cyrille Ngonga Ngomo; Ivan Ermilov; Andreas Both
The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting a dataset to five-star Linked Data however requires the publisher of this dataset to link it with the already available linked datasets. Given the size and growth of the Linked Data Cloud, the current mostly manual approach used for detecting relevant datasets for linking is obsolete. We study the use of topic modelling for dataset search experimentally and present Tapioca, a linked dataset search engine that provides data publishers with similar existing datasets automatically. Our search engine uses a novel approach for determining the topical similarity of datasets. This approach relies on probabilistic topic modelling to determine related datasets by relying solely on the metadata of datasets. We evaluate our approach on a manually created gold standard and with a user study. Our evaluation shows that our algorithm outperforms a set of comparable baseline algorithms including standard search engines significantly by 6i?ź% F1-score. Moreover, we show that it can be used on a large real world dataset with a comparable performance.
Sprachwissenschaft | 2017
Michael Röder; Ricardo Usbeck; Axel-Cyrille Ngonga Ngomo
The ability to compare frameworks from the same domain is of central importance for their introduction into complex applications. In the domains of named entity recognition and entity linking, the large number of systems and their orthogonal evaluation w.r.t. measures and datasets has led to an unclear landscape pertaining to the abilities and weaknesses of the different frameworks. We present GERBIL—an improved platform for repeatable, storable and citable semantic annotation experiments— and how we extended it since its release. With GERBIL, we narrowed this evaluation gap by generating concise, archivable, humanand machine-readable experiments, analytics and diagnostics. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Additionally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. Finally, we implemented additional types of experiments including entity typing. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results. Furthermore, we tackle the central problem of the evaluation of entity linking, i.e., we answer the question how an evaluation algorithm can compare two URIs to each other without being bound to a specific knowledge base. Our approach to this problem opens a way to address the deprecation of URIs of existing gold standards for named entity recognition and entity linking, a feature which is currently not supported by the state of the art. We derived the importance of this feature from usage and dataset requirements collected from the GERBIL user community, which has already carried out more than 24.000 single evaluations using our framework. Through the resulting updates, GERBIL now supports 8 tasks, 46 datasets and 20 systems.
International Conference on Knowledge Engineering and the Semantic Web | 2016
Michael Röder; Peter Haase; Artem Kozlov; Muhammad Saleem; Axel-Cyrille Ngonga Ngomo
Since the introduction of computing machines into companies and industries, searching large enterprise data is an open challenge including diverse and distributed datasets, missing alignment of vocabularies within divisions as well as data isolated in format silos. In this article, we report the requirements of commercial enterprises to the next generation of semantic search engine for large, distributed data. We describe our elicitation process to gather end user requirements, the challenges arising for real-world use cases as well as how such an implementation of this paradigm can be benchmarked. In the end, we present the design of the DIESEL search engine, which aims to implement the requirements of commercial enterprise to semantic search.
european semantic web conference | 2015
Michael Röder; Axel-Cyrille Ngomo Ngonga
The need to bridge between the unstructured data on the Document Web and the structured data on the Web of Data has led to the development of a considerable number of annotation tools. However, these tools are hard to compare due to the diversity of data sets and measures used for evaluation. We will demonstrate GERBIL, an evaluation framework for semantic entity annotation that provides developers, end users and researchers with easy-to-use interfaces for the agile, fine-grained and uniform evaluation of annotation tools on 11 different data sets within 6 different experimental settings on 6 different measures.