Klaus Berberich | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Klaus Berberich is active.

Explore More

Publication

Featured researches published by Klaus Berberich.

Artificial Intelligence | 2013

YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Johannes Hoffart; Fabian M. Suchanek; Klaus Berberich; Gerhard Weikum

We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 447 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95% of the facts in YAGO2. In this paper, we present the extraction methodology and the integration of the spatio-temporal dimension.

european conference on information retrieval | 2010

A language modeling approach for temporal information needs

Klaus Berberich; Srikanta J. Bedathur; Omar Alonso; Gerhard Weikum

This work addresses information needs that have a temporal dimension conveyed by a temporal expression in the user’s query. Temporal expressions such as “in the 1990s” are frequent, easily extractable, but not leveraged by existing retrieval models. One challenge when dealing with them is their inherent uncertainty. It is often unclear which exact time interval a temporal expression refers to. We integrate temporal expressions into a language modeling approach, thus making them first-class citizens of the retrieval model and considering their inherent uncertainty. Experiments on the New York Times Annotated Corpus using Amazon Mechanical Turk to collect queries and obtain relevance assessments demonstrate that our approach yields substantial improvements in retrieval effectiveness.

conference on information and knowledge management | 2013

Robust question answering over the web of linked data

Mohamed Yahya; Klaus Berberich; Shady Elbassuoni; Gerhard Weikum

Knowledge bases and the Web of Linked Data have become important assets for search, recommendation, and analytics. Natural-language questions are a user-friendly mode of tapping this wealth of knowledge and data. However, question answering technology does not work robustly in this setting as questions have to be translated into structured queries and users have to be careful in phrasing their questions. This paper advocates a new approach that allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the users input. To compensate for the omissions, we exploit textual sources associated with entities and relational facts. Our system translates user questions into an extended form of structured SPARQL queries, with text predicates attached to triple patterns. Our solution is based on a novel optimization model, cast into an integer linear program, for joint decomposition and disambiguation of the user question. We demonstrate the quality of our methods through experiments with the QALD benchmark.

Internet Mathematics | 2005

Time-Aware Authority Ranking

Klaus Berberich; Michalis Vazirgiannis; Gerhard Weikum

The link structure of the web is analyzed to measure the authority of pages, which can be taken into account for ranking query results. Due to the enormous dynamics of the web, with millions of pages created, updated, deleted, and linked to every day, temporal aspects of web pages and links are crucial factors for their evaluation. Users are interested in important pages (i.e., pages with high authority score) but are equally interested in the recency of information. Time—and thus the freshness of web content and link structure—emanates as a factor that should be taken into account in link analysis when computing the importance of a page. So far only minor effort has been spent on the integration of temporal aspects into link-analysis techniques. In this paper we introduce T-Rank Light and T-Rank, two link-analysis approaches that take into account the temporal aspects freshness (i.e., timestamps of most recent updates) and activity (i.e., update rates) of pages and links. Experimental results show that T-Rank Light and T-Rank can produce better rankings of web pages.

workshop on algorithms and models for the web graph | 2004

T-Rank: Time-Aware Authority Ranking

Klaus Berberich; Michalis Vazirgiannis; Gerhard Weikum

Analyzing the link structure of the web for deriving a page’s authority and implied importance has deeply affected the way information providers create and link content, the ranking in web search engines, and the users’ access behavior. Due to the enormous dynamics of the web, with millions of pages created, updated, deleted, and linked to every day, timeliness of web pages and links is a crucial factor for their evaluation. Users are interested in important pages (i.e., pages with high authority score) but are equally interested in the recency of information. Time – and thus the freshness of web content and link structure – emanates as a factor that should be taken into account in link analysis when computing the importance of a page. So far only minor effort has been spent on the integration of temporal aspects into link analysis techniques. In this paper we introduce T-Rank, a link analysis approach that takes into account the temporal aspects freshness (i.e., timestamps of most recent updates) and activity (i.e., update rates) of pages and links. Preliminary experimental results show that T-Rank can improve the quality of ranking web pages.

international world wide web conferences | 2012

Deep answers for naturally asked questions on the web of data

Mohamed Yahya; Klaus Berberich; Shady Elbassuoni; Maya Ramanath; Volker Tresp; Gerhard Weikum

We present DEANNA, a framework for natural language question answering over structured knowledge bases. Given a natural language question, DEANNA translates questions into a structured SPARQL query that can be evaluated over knowledge bases such as Yago, Dbpedia, Freebase, or other Linked Data sources. DEANNA analyzes questions and maps verbal phrases to relations and noun phrases to either individual entities or semantic classes. Importantly, it judiciously generates variables for target entities or classes to express joins between multiple triple patterns. We leverage the semantic type system for entities and use constraints in jointly mapping the constituents of the question to relations, classes, and entities. We demonstrate the capabilities and interface of DEANNA, which allows advanced users to influence the translation process and to see how the different components interact to produce the final result.

international conference on management of data | 2010

Durable top-k search in document archives

Leong Hou U; Nikos Mamoulis; Klaus Berberich; Srikanta J. Bedathur

We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA. The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions.

international acm sigir conference on research and development in information retrieval | 2012

Index maintenance for time-travel text search

Avishek Anand; Srikanta J. Bedathur; Klaus Berberich; Ralf Schenkel

Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to efficiently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive. In this work, we describe a novel index structure that efficiently supports time-travel text search and can be maintained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organization, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that maintaining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.

conference on information and knowledge management | 2011

Location-aware click prediction in mobile local search

Dimitrios Lymberopoulos; Peixiang Zhao; Christian König; Klaus Berberich; Jie Liu

Users increasingly rely on their mobile devices to search, locate and discover places and activities around them while on the go. Their decision process is driven by the information displayed on their devices and their current context (e.g. traffic, driving or walking etc.). Even though recent research efforts have already examined and demonstrated how different context parameters such as weather, time and personal preferences affect the way mobile users click on local businesses, little has been done to study how the location of the user affects the click behavior. In this paper we follow a data-driven methodology where we analyze approximately 2 million local search queries submitted by users across the US, to visualize and quantify how differently mobile users click across locations. Based on the data analysis, we propose new location-aware features for improving local search click prediction and quantify their performance on real user query traces. Motivated by the results, we implement and evaluate a data-driven technique where local search models at different levels of location granularity (e.g. city, state, and country levels) are combined together at run-time to further improve click prediction accuracy. By applying the location-aware features and the multiple models at different levels of location granularity on real user query streams from a major, commercially available search engine, we achieve anywhere from 5% to 47% higher Precision than a single click prediction model across the US can achieve.

very large data bases | 2010

Interesting-phrase mining for ad-hoc text analytics

Srikanta J. Bedathur; Klaus Berberich; Jens Dittrich; Nikos Mamoulis; Gerhard Weikum

Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.

Explore More