Daniel Gerber
Leipzig University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Gerber.
international world wide web conferences | 2012
Christina Unger; Lorenz Bühmann; Jens Lehmann; Axel-Cyrille Ngonga Ngomo; Daniel Gerber; Philipp Cimiano
As an increasing amount of RDF data is published as Linked Data, intuitive ways of accessing this data become more and more important. Question answering approaches have been proposed as a good compromise between intuitiveness and expressivity. Most question answering systems translate questions into triples which are matched against the RDF data to retrieve an answer, typically relying on some similarity metric. However, in many cases, triples do not represent a faithful representation of the semantic structure of the natural language question, with the result that more expressive queries can not be answered. To circumvent this problem, we present a novel approach that relies on a parse of the question to produce a SPARQL template that directly mirrors the internal structure of the question. This template is then instantiated using statistical entity identification and predicate detection. We show that this approach is competitive and discuss cases of questions that can be answered with our approach but not with competing approaches.
international semantic web conference | 2014
Axel-Cyrille Ngonga Ngomo; Michael Röder; Daniel Gerber; Sandro Athaide Coelho; Sören Auer; Andreas Both
Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Web of Data yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of named entities. While several approaches aim to tackle this problem, they still achieve poor accuracy. We address this drawback by presenting AGDISTIS, a novel knowledge-base-agnostic approach for named entity disambiguation. Our approach combines the Hypertext-Induced Topic Search (HITS) algorithm with label expansion strategies and string similarity measures. Based on this combination, AGDISTIS can efficiently detect the correct URIs for a given set of named entities within an input text. We evaluate our approach on eight different datasets against state-of-the-art named entity disambiguation frameworks. Our results indicate that we outperform the state-of-the-art approach by up to 29% F-measure.
international world wide web conferences | 2013
Axel-Cyrille Ngonga Ngomo; Lorenz Bühmann; Christina Unger; Jens Lehmann; Daniel Gerber
Over the past years, Semantic Web and Linked Data technologies have reached the backend of a considerable number of applications. Consequently, large amounts of RDF data are constantly being made available across the planet. While experts can easily gather information from this wealth of data by using the W3C standard query language SPARQL, most lay users lack the expertise necessary to proficiently interact with these applications. Consequently, non-expert users usually have to rely on forms, query builders, question answering or keyword search tools to access RDF data. However, these tools have so far been unable to explicate the queries they generate to lay users, making it difficult for these users to i) assess the correctness of the query generated out of their input, and ii) to adapt their queries or iii) to choose in an informed manner between possible interpretations of their input. This paper addresses this drawback by presenting SPARQL2NL, a generic approach that allows verbalizing SPARQL queries, i.e., converting them into natural language. Our framework can be integrated into applications where lay users are required to understand SPARQL or to generate SPARQL queries in a direct (forms, query builders) or an indirect (keyword search, question answering) manner. We evaluate our approach on the DBpedia question set provided by QALD-2 within a survey setting with both SPARQL experts and lay users. The results of the 115 filled surveys show that SPARQL2NL can generate complete and easily understandable natural language descriptions. In addition, our results suggest that even SPARQL experts can process the natural language representation of SPARQL queries computed by our approach more efficiently than the corresponding SPARQL queries. Moreover, non-experts are enabled to reliably understand the content of SPARQL queries.
knowledge acquisition, modeling and management | 2012
Daniel Gerber; Axel-Cyrille Ngonga Ngomo
Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, a bootstrapping strategy for extracting RDF from text. The idea behind BOA is to extract natural-language patterns that represent predicates found on the Data Web from unstructured data by using background knowledge from the Data Web. These patterns are then used to extract instance knowledge from natural-language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. The approach followed by BOA is quasi independent of the language in which the corpus is written. We demonstrate our approach by applying it to four different corpora and two different languages. We evaluate BOA on these data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with very high accuracy.
international semantic web conference | 2012
Jens Lehmann; Daniel Gerber; Mohamed Morsey; Axel-Cyrille Ngonga Ngomo
One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. In this article, we present DeFacto (Deep Fact Validation) --- an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact.
international semantic web conference | 2013
Daniel Gerber; Sebastian Hellmann; Lorenz Bühmann; Tommaso Soru; Axel-Cyrille Ngonga Ngomo
The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.
web intelligence | 2011
Saeedeh Shekarpour; Sören Auer; Axel-Cyrille Ngonga Ngomo; Daniel Gerber; Sebastian Hellmann; Claus Stadler
The search for information on the Web of Data is becoming increasingly difficult due to its dramatic growth. Especially novice users need to acquire both knowledge about the underlying ontology structure and proficiency in formulating formal queries (e. g. SPARQL queries) to retrieve information from Linked Data sources. So as to simplify and automate the querying and retrieval of information from such sources, we present in this paper a novel approach for constructing SPARQL queries based on user-supplied keywords. Our approach utilizes a set of predefined basic graph pattern templates for generating adequate interpretations of user queries. This is achieved by obtaining ranked lists of candidate resource identifiers for the supplied keywords and then injecting these identifiers into suitable positions in the graph pattern templates. The main advantages of our approach are that it is completely agnostic of the underlying knowledge base and ontology schema, that it scales to large knowledge bases and is simple to use. We evaluate17 possible valid graph pattern templates by measuring their precision and recall on 53 queries against DBpedia. Our results show that 8 of these basic graph pattern templates return results with a precision above 70%. Our approach is implemented as a Web search interface and performs sufficiently fast to return instant answers to the user even with large knowledge bases.
european conference on artificial intelligence | 2014
Axel-Cyrille Ngonga Ngomo; Michael Röder; Daniel Gerber; Sandro Athaide Coelho; Sören Auer; Andreas Both
Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Data Web yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of named entities. We address this issue by presenting AGDISTIS, a novel knowledge-base-agnostic approach for named entity disambiguation. Our approach combines the Hypertext-Induced Topic Search (HITS) algorithm with label expansion strategies and string similarity measures. Based on this combination, AGDISTIS can efficiently detect the correct URIs for a given set of named entities within an input text.
Journal of Web Semantics | 2015
Daniel Gerber; Diego Esteves; Jens Lehmann; Lorenz Bühmann; Axel-Cyrille Ngonga Ngomo; René Speck
One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. In this article, we present DeFacto (Deep Fact Validation)-an algorithm able to validate facts by finding trustworthy sources for them on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of web pages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. To achieve this goal, DeFacto collects and combines evidence from web pages written in several languages. In addition, DeFacto provides support for facts with a temporal scope, i.e.,?it can estimate in which time frame a fact was valid. Given that the automatic evaluation of facts has not been paid much attention to so far, generic benchmarks for evaluating these frameworks were not previously available. We thus also present a generic evaluation framework for fact checking and make it publicly available.
european semantic web conference | 2014
Anisa Rula; Matteo Palmonari; Axel-Cyrille Ngonga Ngomo; Daniel Gerber; Jens Lehmann; Lorenz Bühmann
Information on the temporal interval of validity for facts described by RDF triples plays an important role in a large number of applications. Yet, most of the knowledge bases available on the Web of Data do not provide such information in an explicit manner. In this paper, we present a generic approach which addresses this drawback by inserting temporal information into knowledge bases. Our approach combines two types of information to associate RDF triples with time intervals. First, it relies on temporal information gathered from the document Web by an extension of the fact validation framework DeFacto. Second, it harnesses the time information contained in knowledge bases. This knowledge is combined within a three-step approach which comprises the steps matching, selection and merging. We evaluate our approach against a corpus of facts gathered from Yago2 by using DBpedia and Freebase as input and different parameter settings for the underlying algorithms. Our results suggest that we can detect temporal information for facts from DBpedia with an F-measure of up to 70%.