Sebastian Hellmann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sebastian Hellmann is active.

Explore More

Publication

Featured researches published by Sebastian Hellmann.

Semantic Web | 2015

DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia

Jens Lehmann; Robert Isele; Max Jakob; Anja Jentzsch; Dimitris Kontokostas; Pablo N. Mendes; Sebastian Hellmann; Mohamed Morsey; Patrick van Kleef; Sören Auer

The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a world-wide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications.

international world wide web conferences | 2009

Triplify: light-weight linked data publication from relational databases

Sören Auer; Sebastian Dietzold; Jens Lehmann; Sebastian Hellmann; David Aumueller

In this paper we present Triplify - a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relational database queries. Triplify transforms the resulting relations into RDF statements and publishes the data on the Web in various RDF serializations, in particular as Linked Data. The rationale for developing Triplify is that the largest part of information on the Web is already stored in structured form, often as data contained in relational databases, but usually published by Web applications only as HTML mixing structure, layout and content. In order to reveal the pure structured information behind the current Web, we have implemented Triplify as a light-weight software component, which can be easily integrated into and deployed by the numerous, widely installed Web applications. Our approach includes a method for publishing update logs to enable incremental crawling of linked data sources. Triplify is complemented by a library of configurations for common relational schemata and a REST-enabled data source registry. Triplify configurations containing mappings are provided for many popular Web applications, including osCommerce, WordPress, Drupal, Gallery, and phpBB. We will show that despite its light-weight architecture Triplify is usable to publish very large datasets, such as 160GB of geo data from the OpenStreetMap project.

international semantic web conference | 2009

LinkedGeoData: Adding a Spatial Dimension to the Web of Data

Sören Auer; Jens Lehmann; Sebastian Hellmann

In order to employ the Web as a medium for data and information integration, comprehensive datasets and vocabularies are required as they enable the disambiguation and alignment of other data and information. Many real-life information integration and aggregation tasks are impossible without comprehensive background knowledge related to spatial features of the ways, structures and landscapes surrounding us. In this paper we contribute to the generation of a spatial dimension for the Data Web by elaborating on how the collaboratively collected OpenStreetMap data can be transformed and represented adhering to the RDF data model. We describe how this data can be interlinked with other spatial data sets, how it can be made accessible for machines according to the linked data paradigm and for humans by means of a faceted geo-data browser.

international world wide web conferences | 2014

Test-driven evaluation of linked data quality

Dimitris Kontokostas; Patrick Westphal; Sören Auer; Sebastian Hellmann; Jens Lehmann; Roland Cornelissen; Amrapali Zaveri

Linked Open Data (LOD) comprises an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with Linked Open Vocabularies (LOV). One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.

semantics and digital media technologies | 2009

RelFinder: Revealing Relationships in RDF Knowledge Bases

Philipp Heim; Sebastian Hellmann; Jens Lehmann; Steffen Lohmann; Timo Stegemann

The Semantic Web has recently seen a rise of large knowledge bases (such as DBpedia) that are freely accessible via SPARQL endpoints. The structured representation of the contained information opens up new possibilities in the way it can be accessed and queried. In this paper, we present an approach that extracts a graph covering relationships between two objects of interest. We show an interactive visualization of this graph that supports the systematic analysis of the found relationships by providing highlighting, previewing, and filtering features.

International Journal on Semantic Web and Information Systems | 2009

Learning of OWL Class Descriptions on Very Large Knowledge Bases

Sebastian Hellmann; Jens Lehmann; SÃ¶ren Auer

The vision of the Semantic Web is to make use of semantic representations on the largest possible scale - the Web. Large knowledge bases such as DBpedia, OpenCyc, GovTrack, and others are emerging and are freely available as Linked Data and SPARQL endpoints. Exploring and analysing such knowledge bases is a significant hurdle for Semantic Web research and practice. As one possible direction for tackling this problem, the authors present an approach for obtaining complex class descriptions from objects in knowledge bases by using Machine Learning techniques. They describe in detail how we leverage existing techniques to achieve scalability on large knowledge bases available as SPARQL endpoints or Linked Data. Their algorithms are made available in the open source DL-Learner project and we present several real-life scenarios in which they can be used by Semantic Web applications.

Program: Electronic Library and Information Systems | 2012

DBpedia and the live extraction of structured data from Wikipedia

Mohamed Morsey; Jens Lehmann; Sören Auer; Claus Stadler; Sebastian Hellmann

Purpose – DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia‐Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues.Design/methodology/approach – Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia‐Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia‐Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors.Findings – During the realization of DBpedia‐Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently‐upd...

Journal of Web Semantics | 2012

Internationalization of Linked Data: The case of the Greek DBpedia edition

Dimitris Kontokostas; Charalampos Bratsas; Sören Auer; Sebastian Hellmann; Ioannis Antoniou; George Metakides

This paper describes the deployment of the Greek DBpedia and the contribution to the DBpedia information extraction framework with regard to internationalization (I18n) and multilingual support. I18n filters are proposed as pluggable components in order to address issues when extracting knowledge from non-English Wikipedia editions. We report on our strategy for supporting the International Resource Identifier (IRI) and introduce two new extractors to complement the I18n filters. Additionally, the paper discusses the definition of Transparent Content Negotiation (TCN) rules for IRIs to address de-referencing and IRI serialization problems. The aim of this research is to establish best practices (complemented by software) to allow the DBpedia community to easily generate, maintain and properly interlink language-specific DBpedia editions. Furthermore, these best practices can be applied for the publication of Linked Data in non-Latin languages in general.

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II | 2009

DBpedia Live Extraction

Sebastian Hellmann; Claus Stadler; Jens Lehmann; Sören Auer

The DBpedia project extracts information from Wikipedia, interlinks it with other knowledge bases, and makes this data available as RDF. So far the DBpedia project has succeeded in creating one of the largest knowledge bases on the Data Web, which is used in many applications and research prototypes. However, the heavy-weight extraction process has been a drawback. It requires manual effort to produce a new release and the extracted information is not up-to-date. We extended DBpedia with a live extraction framework, which is capable of processing tens of thousands of changes per day in order to consume the constant stream of Wikipedia updates. This allows direct modifications of the knowledge base and closer interaction of users with DBpedia. We also show how the Wikipedia community itself is now able to take part in the DBpedia ontology engineering process and that an interactive round-trip engineering between Wikipedia and DBpedia is made possible.

international semantic web conference | 2013

Real-Time RDF Extraction from Unstructured Data Streams

Daniel Gerber; Sebastian Hellmann; Lorenz Bühmann; Tommaso Soru; Axel-Cyrille Ngonga Ngomo

The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

Explore More