Gerhard Weikum | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gerhard Weikum is active.

Explore More

Publication

Featured researches published by Gerhard Weikum.

international world wide web conferences | 2007

Yago: a core of semantic knowledge

Fabian M. Suchanek; Gjergji Kasneci; Gerhard Weikum

We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE). The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information extraction techniques.

international conference on management of data | 1993

The LRU-K page replacement algorithm for database disk buffering

Elizabeth J. O'Neil; Patrick E. O'Neil; Gerhard Weikum

This paper introduces a new approach to database disk buffering, called the LRU-K method. The basic idea of LRU-K is to keep track of the times of the last K references to popular database pages, using this information to statistically estimate the interarrival times of references on a page by page basis. Although the LRU-K approach performs optimal statistical inference under relatively standard assumptions, it is fairly simple and incurs little bookkeeping overhead. As we demonstrate with simulation experiments, the LRU-K algorithm surpasses conventional buffering algorithms in discriminating between frequently and infrequently referenced pages. In fact, LRU-K can approach the behavior of buffering algorithms in which page sets with known access frequencies are manually assigned to different buffer pools of specifically tuned sizes. Unlike such customized buffering algorithms however, the LRU-K method is self-tuning, and does not rely on external hints about workload characteristics. Furthermore, the LRU-K algorithm adapts in real time to changing patterns of access.

Artificial Intelligence | 2013

YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Johannes Hoffart; Fabian M. Suchanek; Klaus Berberich; Gerhard Weikum

We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 447 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95% of the facts in YAGO2. In this paper, we present the extraction methodology and the integration of the spatio-temporal dimension.

Journal of Web Semantics | 2008

YAGO: A Large Ontology from Wikipedia and WordNet

Fabian M. Suchanek; Gjergji Kasneci; Gerhard Weikum

This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGOs precision at 95%-as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing n-ary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGOs data.

very large data bases | 2010

The RDF-3X engine for scalable management of RDF data

Thomas Neumann; Gerhard Weikum

RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The “pay-as-you-go” nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing. The physical design is identical for all RDF-3X databases regardless of their workloads, and completely eliminates the need for index tuning by exhaustive indexes for all permutations of subject-property-object triples and their binary and unary projections. These indexes are highly compressed, and the query processor can aggressively leverage fast merge joins with excellent performance of processor caches. The query optimizer is able to choose optimal join orders even for complex queries, with a cost model that includes statistical synopses for entire join paths. Although RDF-3X is optimized for queries, it also provides good support for efficient online updates by means of a staging architecture: direct updates to the main database indexes are deferred, and instead applied to compact differential indexes which are later merged into the main indexes in a batched manner. Experimental studies with several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching, manyway star-joins, and long path-joins demonstrate that RDF-3X can outperform the previously best alternatives by one or two orders of magnitude.

Untitled Event | 2007

YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia

Fabian M. Suchanek; Gjergji Kasneci; Gerhard Weikum

ACM Transactions on Database Systems | 1991

Principles and realization strategies of multilevel transaction management

Gerhard Weikum

One of the demands of database system transaction management is to achieve a high degree of concurrency by taking into consideration the semantics of high-level operations. On the other hand, the implementation of such operations must pay attention to conflicts on the storage representation levels below. To meet these requirements in a layered architecture, we propose a multilevel transaction management utilizing layer-specific semantics. Based on the theoretical notion of multilevel serializability, a family of concurrency control strategies is developed. Suitable recovery protocols are investigated for aborting single transactions and for restarting the system after a crash. The choice of levels involved in a multilevel transaction strategy reveals an inherent trade-off between increased concurrency and growing recovery costs. A series of measurements has been performed in order to compare several strategies. Preliminary results indicate considerable performance gains of the multilevel transaction approach.

intelligent information systems | 1998

From Centralized Workflow Specification to Distributed WorkflowExecution

Peter Muth; Dirk Wodtke; Jeanine Weissenfels; Angelika Kotz Dittrich; Gerhard Weikum

Current workflow management systems fall short of supporting large-scale distributed, enterprise-wide applications. We present a scalable, rigorously founded approach to enterprise-wide workflow management, based on the distributed execution of state and activity charts. By exploiting the formal semantics of state and activity charts, we develop an algorithm for transforming a centralized state and activity chart into a provably equivalent partitioned one, suitable for distributed execution. A synchronization scheme is developed that guarantees an execution equivalent to a non-distributed one. This basic solution is further refined in order to reduce communication overhead and exploit parallelism between partitions whenever possible. The developed synchronization schemes are compared in terms of the number and size of synchronization messages.

very large data bases | 2004

Top-k query evaluation with probabilistic guarantees

Martin Theobald; Gerhard Weikum; Ralf Schenkel

Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagins threshold algorithm (TA). Since the users goal behind top-k queries is to identify one or a few relevant and novel data items, it is intriguing to use approximate variants of TA to reduce run-time costs. This paper introduces a family of approximate top-k algorithms based on probabilistic arguments. When scanning index lists of the underlying multidimensional data space in descending order of local scores, various forms of convolution and derived bounds are employed to predict when it is safe, with high probability, to drop candidate items and to prune the index scans. The precision and the efficiency of the developed methods are experimentally evaluated based on a large Web corpus and a structured data collection.

international world wide web conferences | 2007