Johannes Hoffart | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Johannes Hoffart is active.

Explore More

Publication

Featured researches published by Johannes Hoffart.

Artificial Intelligence | 2013

YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Johannes Hoffart; Fabian M. Suchanek; Klaus Berberich; Gerhard Weikum

We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 447 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95% of the facts in YAGO2. In this paper, we present the extraction methodology and the integration of the spatio-temporal dimension.

conference on information and knowledge management | 2012

KORE: keyphrase overlap relatedness for entity disambiguation

Johannes Hoffart; Stephan Seufert; Dat Ba Nguyen; Martin Theobald; Gerhard Weikum

Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. To this end, we have developed a novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases. This measure improves the quality of prior link-based models, and also eliminates the need for (usually Wikipedia-centric) explicit interlinkage between entities. Thus, our method is more versatile and can cope with long-tail and newly emerging entities that have few or no links associated with them. For efficiency, we have developed approximation techniques based on min-hash sketches and locality-sensitive hashing. Our experiments on semantic relatedness and on named entity disambiguation demonstrate the superiority of our method compared to state-of-the-art baselines.

international world wide web conferences | 2014

Discovering emerging entities with ambiguous names

Johannes Hoffart; Yasemin Altun; Gerhard Weikum

Knowledge bases (KBs) contain data about a large number of people, organizations, and other entities. However, this knowledge can never be complete due to the dynamics of the ever-changing world: new companies are formed every day, new songs are composed every minute and become of interest for addition to a KB. To keep up with the real worlds entities, the KB maintenance process needs to continuously discover newly emerging entities in news and other Web streams. In this paper we focus on the most difficult case where the names of new entities are ambiguous. This raises the technical problem to decide whether an observed name refers to a known entity or represents a new entity. This paper presents a method to solve this problem with high accuracy. It is based on a new model of measuring the confidence of mapping an ambiguous mention to an existing entity, and a new model of representing a new entity with the same ambiguous name as a set of weighted keyphrases. The method can handle both Wikipedia-derived entities that typically constitute the bulk of large KBs as well as entities that exist only in other Web sources such as online communities about music or movies. Experiments show that our entity discovery method outperforms previous methods for coping with out-of-KB entities (called unlinkable in entity linking).

international semantic web conference | 2016

YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames

Thomas Rebele; Fabian M. Suchanek; Johannes Hoffart; Joanna Biega; Erdal Kuzey; Gerhard Weikum

YAGO is a large knowledge base that is built automatically from Wikipedia, WordNet and GeoNames. The project combines information from Wikipedias in 10 different languages into a coherent whole, thus giving the knowledge a multilingual dimension. It also attaches spatial and temporal information to many facts, and thus allows the user to query the data over space and time. YAGO focuses on extraction quality and achieves a manually evaluated precision of 95 %. In this paper, we explain how YAGO is built from its sources, how its quality is evaluated, how a user can access it, and how other projects utilize it.

international symposium on wikis and open collaboration | 2009

An architecture to support intelligent user interfaces for Wikis by means of Natural Language Processing

Johannes Hoffart; Torsten Zesch; Iryna Gurevych

We present an architecture for integrating a set of Natural Language Processing (NLP) techniques with a wiki platform. This entails support for adding, organizing, and finding content in the wiki. We perform a comprehensive analysis of how NLP techniques can support the user interaction with the wiki, using an intelligent interface to provide suggestions. The architecture is designed to be deployed with any existing wiki platform, especially those used in corporate environments. We implemented a prototype integrating the NLP techniques keyphrase extraction and text segmentation, as well as an improved search engine. The prototype is integrated with two widely used wiki platforms: Media-Wiki and TWiki.

international acm sigir conference on research and development in information retrieval | 2016

Context-Sensitive Auto-Completion for Searching with Entities and Categories

Andreas Schmidt; Johannes Hoffart; Dragan Milchevski; Gerhard Weikum

When searching in a document collection by keywords, good auto-completion suggestions can be derived from query logs and corpus statistics. On the other hand, when querying documents which have automatically been linked to entities and semantic categories, auto-completion has not been investigated much. We have developed a semantic auto-completion system, where suggestions for entities and categories are computed in real-time from the context of already entered entities or categories and from entity-level co-occurrence statistics for the underlying corpus. Given the huge size of the knowledge bases that underlie this setting, a challenge is to compute the best suggestions fast enough for interactive user experience. Our demonstration shows the effectiveness of our method, and its interactive usability.

conference on information and knowledge management | 2014

AESTHETICS: Analytics with Strings, Things, and Cats

Johannes Hoffart; Dragan Milchevski; Gerhard Weikum

This paper describes an advanced news analytics and exploration system that allows users to visualize trends of entities like politicians, countries, and organizations in continuously updated news articles. Our system improves state-of-the-art text analytics by linking ambiguous names in news articles to entities in knowledge bases like Freebase, DBpedia or YAGO. This step enables indexing entities and interpreting the contents in terms of entities. This way, the analysis of trends and co-occurrences of entities gains accuracy, and by leveraging the taxonomic type hierarchy of knowledge bases, also in expressiveness and usability. In particular, we can analyze not only individual entities, but also categories of entities and their combinations, including co-occurrences with informative text phrases. Our Web-based system demonstrates the power of this approach by insightful anecdotic analysis of recent events in the news.

international conference on management of data | 2013

Discovering and disambiguating named entities in text

Johannes Hoffart

Disambiguating named entities in natural language texts maps ambiguous names to canonical entities registered in a knowledge base such as DBpedia, Freebase, or YAGO. Knowing the specific entity is an important asset for several other tasks, e.g. entity-based information retrieval or higher-level information extraction. Our approach to named entity disambiguation makes use of several ingredients: the prior probability of an entity being mentioned, the similarity between the context of the mention in the text and an entity, as well as the coherence among the entities. Extending this method, we present a novel and highly efficient measure to compute the semantic coherence between entities. This measure is especially powerful for long-tail entities or such entities that are not yet present in the knowledge base. Reliably identifying names in the input text that are not part of the knowledge base is the current focus of our work.

meeting of the association for computational linguistics | 2016

DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences

Patrick Ernst; Amy Siu; Dragan Milchevski; Johannes Hoffart; Gerhard Weikum

Despite the abundance of biomedical literature and health discussions in online communities, it is often tedious to retrieve informative contents for health-centric information needs. Users can query scholarly work in PubMed by keywords and MeSH terms, and resort to Google for everything else. This demo paper presents the DeepLife system, to overcome the limitations of existing search engines for life science and health topics. DeepLife integrates large knowledge bases and harnesses entity linking methods, to support search and exploration of scientific literature, newspaper feeds, and social media, in terms of keywords and phrases, biomedical entities, and taxonomic categories. It also provides functionality for entityaware text analytics over health-centric contents.

international world wide web conferences | 2016

The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities

Johannes Hoffart; Dragan Milchevski; Gerhard Weikum; Avishek Anand; Jaspreet Singh

Entity search over news, social media and the Web allows users to precisely retrieve concise information about specific people, organizations, movies and their characters, and other kinds of entities. This expressive search mode builds on two major assets: 1) a knowledge base (KB) that contains the entities of interest and 2) entity markup in the documents of interest derived by automatic disambiguation of entity names (NED) and linking names to the KB. These prerequisites are not easily available, though, in the important case when a user is interested in a newly emerging entity (EE) such as new movies, new songs, etc. Automatic methods for detecting and canonicalizing EEs are not nearly at the same level as the NED methods for prominent entities that have rich descriptions in the KB. To overcome this major limitation, we have developed an approach and prototype system that allows searching for EEs in a user-friendly manner. The approach leverages the human in the loop by prompting for user feedback on candidate entities and on characteristic keyphrases for EEs. For convenience and low burden on users, this process is supported by the automatic harvesting oftentative keyphrases. Our demo system shows this interactive process and its high usability.

Explore More