Stephan Seufert | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephan Seufert is active.

Explore More

Publication

Featured researches published by Stephan Seufert.

conference on information and knowledge management | 2010

Fast and accurate estimation of shortest paths in large graphs

Andrey Gubichev; Srikanta J. Bedathur; Stephan Seufert; Gerhard Weikum

Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a number of techniques exist for answering reachability queries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs. We evaluate our techniques - implemented within a fully functional RDF graph database system - over large real-world social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0% and 1% on average.

conference on information and knowledge management | 2012

KORE: keyphrase overlap relatedness for entity disambiguation

Johannes Hoffart; Stephan Seufert; Dat Ba Nguyen; Martin Theobald; Gerhard Weikum

Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. To this end, we have developed a novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases. This measure improves the quality of prior link-based models, and also eliminates the need for (usually Wikipedia-centric) explicit interlinkage between entities. Thus, our method is more versatile and can cope with long-tail and newly emerging entities that have few or no links associated with them. For efficiency, we have developed approximation techniques based on min-hash sketches and locality-sensitive hashing. Our experiments on semantic relatedness and on named entity disambiguation demonstrate the superiority of our method compared to state-of-the-art baselines.

international conference on data engineering | 2013

FERRARI: Flexible and efficient reachability range assignment for graph indexing

Stephan Seufert; Avishek Anand; Srikanta J. Bedathur; Gerhard Weikum

In this paper, we propose a scalable and highly efficient index structure for the reachability problem over graphs. We build on the well-known node interval labeling scheme where the set of vertices reachable from a particular node is compactly encoded as a collection of node identifier ranges. We impose an explicit bound on the size of the index and flexibly assign approximate reachability ranges to nodes of the graph such that the number of index probes to answer a query is minimized. The resulting tunable index structure generates a better range labeling if the space budget is increased, thus providing a direct control over the trade off between index size and the query processing performance. By using a fast recursive querying method in conjunction with our index structure, we show that, in practice, reachability queries can be answered in the order of microseconds on an off-the-shelf computer - even for the case of massive-scale real world graphs. Our claims are supported by an extensive set of experimental results using a multitude of benchmark and real-world web-scale graph datasets.

international world wide web conferences | 2010

Antourage: mining distance-constrained trips from flickr

Saral Jain; Stephan Seufert; Srikanta J. Bedathur

We study how to automatically extract tourist trips from large volumes of geo-tagged photographs. Working with more than 8 million of these photographs that are publicly available via photo- sharing communities such as Flickr and Panoramio, our goal is to satisfy the needs of a tourist who specifies a starting location (typically a hotel) together with a bounded travel distance and demands a tour that visits the popular sites along the way. Our system, named ANTOURAGE, solves this intractable problem using a novel adaptation of the max-min ant system (MMAS) meta-heuristic. Experiments using GPS metadata crawled from Flickr show that ANTOURAGE can generate high-quality tours.

First International Workshop on Graph Data Management Experiences and Systems | 2013

Sparqling kleene: fast property paths in RDF-3X

Andrey Gubichev; Srikanta J. Bedathur; Stephan Seufert

As Semantic Web efforts continue to gather steam, the RDF engines are faced with graphs with millions of nodes and billions of edges. While much recent work in addressing the resulting scalability issues in processing queries over these datasets have mainly considered SPARQL 1.0, the next-generation query language recommendations have proposed the addition of regular expression restricted navigation queries into SPARQL. We address the problem of supporting efficient processing of property paths into RDF-3X -- a high-performance RDF engine. In this paper, we restrict our attention to a restricted definition of property paths that is not only tractable but also most commonly used -- instead of enumerating all paths that satisfy the given query, we focus on regular expression based reachability queries. Based on this, we make the following three major technical contributions: first, we present a detailed account of integrating the recently proposed highly compact reachability index called FERRARI into the RDF-3X engine to support property path evaluation; second, we show how property path queries can be efficiently answered using multiple instances of this index -- one instance for each distinct label in the graph; and finally, we develop a set of queries over real-world RDF data that can serve as benchmark set for evaluating the efficiency of property path queries. Our experimental results over Yago2, a large RDF-based knowledge base, show that our proposed approach is highly scalable and flexible.

international conference on data mining | 2010

Bonsai: Growing Interesting Small Trees

Stephan Seufert; Srikanta J. Bedathur; Julián Mestre; Gerhard Weikum

Graphs are increasingly used to model a variety of loosely structured data such as biological or social networks and entity-relationships. Given this profusion of large-scale graph data, efficiently discovering interesting substructures buried within is essential. These substructures are typically used in determining subsequent actions, such as conducting visual analytics by humans or designing expensive biomedical experiments. In such settings, it is often desirable to constrain the size of the discovered results in order to directly control the associated costs. In this paper, we address the problem of finding cardinality-constrained connected sub trees in large node-weighted graphs that maximize the sum of weights of selected nodes. We provide an efficient constant-factor approximation algorithm for this strongly NP-hard problem. Our techniques can be applied in a wide variety of application settings, for example in differential analysis of graphs, a problem that frequently arises in bioinformatics but also has applications on the web.

conference on information and knowledge management | 2016

ESPRESSO: Explaining Relationships between Entity Sets

Stephan Seufert; Klaus Berberich; Srikanta J. Bedathur; Sarath Kumar Kondreddi; Patrick Ernst; Gerhard Weikum

Analyzing and explaining relationships between entities in a knowledge graph is a fundamental problem with many applications. Prior work has been limited to extracting the most informative subgraph connecting two entities of interest. This paper extends and generalizes the state of the art by considering the relationships between two sets of entities given at query time. Our method, coined ESPRESSO, explains the connection between these sets in terms of a small number of relatedness cores: dense sub-graphs that have strong relations with both query sets. The intuition for this model is that the cores correspond to key events in which entities from both sets play a major role. For example, to explain the relationships between US politicians and European politicians, our method identifies events like the PRISM scandal and the Syrian Civil War as relatedness cores. Computing cores of bounded size is NP-hard. This paper presents efficient approximation algorithms. Our experiments with real-life knowledge graphs demonstrate the practical viability of our approach and, through user studies, the superior output quality compared to state-of-the-art baselines.

Proceedings of Semantic Web Information Management on Semantic Web Information Management | 2014

Using Graph Summarization for Join-Ahead Pruning in a Distributed RDF Engine

Sairam Gurajada; Stephan Seufert; Iris Miliaraki; Martin Theobald

The need for scalable and efficient RDF stores has seen a high demand recently. Many efficient systems, both centralized and distributed, have been proposed. Since a row-oriented output is required by SPARQL, most of the current systems rely on relational joins. One of the problems with relational joins, though, is a performance bottleneck imposed by the generation of large intermediate relations which could be avoided by using more accurate data and pruning statistics. To address this problem, recently several systems have been proposed that employ bisimulation-based graph summaries -- adopted from XML indexing -- over large RDF graphs in order to facilitate join-ahead pruning. In this paper, we discuss a different, locality-based, graph summarization approach for RDF data and highlight its utilization for join-ahead pruning in a distributed SPARQL engine. Based on our recently developed TriAD engine, we present a detailed comparison of processing techniques for these graph summaries over the synthetic LUBM benchmark.

international world wide web conferences | 2016

Instant Espresso: Interactive Analysis of Relationships in Knowledge Graphs

Stephan Seufert; Patrick Ernst; Srikanta J. Bedathur; Sarath Kumar Kondreddi; Klaus Berberich; Gerhard Weikum

We demonstrate InstantEspresso, a system to explain the relationship between two sets of entities in knowledge graphs. Instant-Espresso answers questions of the form. Which European politicians are related to politicians in the United States, and how? or How can one summarize the relationship between China and countries from the Middle East? Each question is specified by two sets of query entities. These sets (e.g. European politicians or United States politicians) can be determined by an initial graph query over a knowledge graph capturing relationships between real-world entities. Instant-Espresso analyzes the (indirect) relationships that connect entities from both sets and provides a user-friendly explanation of the answer in the form of concise subgraphs. These so-called relatedness cores correspond to important event complexes involving entities from the two sets. Our system provides a user interface for the specification of entity sets and displays a visually appealing visualization of the extracted subgraph to the user. The demonstrated system can be used to provide background information on the current state-of-affairs between real-world entities such as politicians, organizations, and the like, e.g. to a journalist preparing an article involving the entities of interest. InstantEspresso is available for an online demonstration at the URL http://espresso.mpi-inf.mpg.de/.

Archive | 2015

Algorithmic Building Blocks for Relationship Analysis over Large Graphs

Stephan Seufert; Srikanta J. Bedathur; Denilson Barbosa; Christoph Weidenbach

Over the last decade, large-scale graph datasets with millions of vertices and edges have emerged in many diverse problem domains. Notable examples include online social networks, the Web graph, or knowledge graphs connecting semantically typed entities. An important problem in this setting lies in the analysis of the relationships between the contained vertices, in order to gain insights into the structure and dynamics of the modeled interactions. In this work, we develop efficient and scalable algorithms for three important problems in relationship analysis and make the following contributions: • We present the Ferrari index structure to quickly probe a graph for the existence of an (indirect) relationship between two designated query vertices, based on an adaptive compression of the transitive closure of the graph. • In order to quickly assess the relationship strength for a given pair of vertices as well as computing the corresponding paths, we present the PathSketch index structure for the fast approximation of shortest paths in large graphs. Our work extends a previously proposed prototype in several ways, including efficient index construction, compact index size, and faster query processing. • We present the Espresso algorithm for characterizing the relationship between two sets of entities in a knowledge graph. This algorithm is based on the identification of important events from the interaction history of the entities of interest. These events are subsequently expanded into coherent subgraphs, corresponding to characteristic topics describing the relationship. We provide extensive experimental evaluations for each of the methods, demonstrating the efficiency of the individual algorithms as well as their usefulness for facilitating effective analysis of relationships in large graphs.

Explore More