Matthias Bender | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthias Bender is active.

Explore More

Publication

Featured researches published by Matthias Bender.

international acm sigir conference on research and development in information retrieval | 2005

Improving collection selection with overlap awareness in P2P search engines

Matthias Bender; Sebastian Michel; Peter Triantafillou; Gerhard Weikum; Christian Zimmer

Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.

international conference on data engineering | 2008

Exploiting social relations for query expansion and result ranking

Matthias Bender; Tom Crecelius; Mouna Kacimi; Sebastian Michel; Thomas Neumann; Josiane Xavier Parreira; Ralf Schenkel; Gerhard Weikum

Online communities have recently become a popular tool for publishing and searching content, as well as for finding and connecting to other users that share common interests. The content is typically user-generated and includes, for example, personal blogs, bookmarks, and digital photos. A particularly intriguing type of content is user-generated annotations (tags) for content items, as these concise string descriptions allow for reasonings about the interests of the user who created the content, but also about the user who generated the annotations. This paper presents a framework to cast the different entities of such networks into a unified graph model representing the mutual relationships of users, content, and tags. It derives scoring functions for each of the entities and relations. We have performed an experimental evaluation on two real-world datasets (crawled from deli.cio.us and Flickr) where manual user assessments of the query result quality show that our unified graph framework delivers high-quality results on social networks.

conference on information and knowledge management | 2006

Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices

Sebastian Michel; Matthias Bender; Nikos Ntarmos; Peter Triantafillou; Gerhard Weikum; Christian Zimmer

Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peers data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.

extending database technology | 2006

IQN routing: integrating quality and novelty in P2P querying and ranking

Sebastian Michel; Matthias Bender; Peter Triantafillou; Gerhard Weikum

We consider a collaboration of peers autonomously crawling the Web. A pivotal issue when designing a peer-to-peer (P2P) Web search engine in this environment is query routing: selecting a small subset of (a potentially very large number of relevant) peers to contact to satisfy a keyword query. Existing approaches for query routing work well on disjoint data sets. However, naturally, the peers’ data collections often highly overlap, as popular documents are highly crawled. Techniques for estimating the cardinality of the overlap between sets, designed for and incorporated into information retrieval engines are very much lacking. In this paper we present a comprehensive evaluation of appropriate overlap estimators, showing how they can be incorporated into an efficient, iterative approach to query routing, coined Integrated Quality Novelty (IQN). We propose to further enhance our approach using histograms, combining overlap estimation with the available score/ranking information. Finally, we conduct a performance evaluation in MINERVA, our prototype P2P Web search engine.

very large data bases | 2004

COMPASS: a concept-based web search engine for HTML, XML, and deep web data

Jens Graupmann; Michael Biwer; Christian Zimmer; Patrick Zimmer; Matthias Bender; Martin Theobald; Gerhard Weikum

This chapter introduces a concept-based Web search engine for HTML, XML, and deep Web data—Context-Oriented Multi-Format Portal-Aware Search System (COMPASS). It also presents the features and architectures of COMPASS. The internal query language of COMPASS resembles a highly simplified version of mainstream languages such as SQL, XPath, or XQuery. Search conditions refer to concepts and values that correspond to element names and contents in an XML setting, and attribute names and values in a SQL setting. COMPASS uses a centralized data index for efficient search evaluation. All data and also the relationships among documents are represented in a relational database. All data formats are transformed into XML by using heuristics as well as external annotation tools such as GATE.

Distributed and Parallel Databases | 2009

Distributed top-k aggregation queries at large

Thomas Neumann; Matthias Bender; Sebastian Michel; Ralf Schenkel; Peter Triantafillou; Gerhard Weikum

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.

Untitled Event | 2007

A Comparative Study of Pub/Sub Methods in Structured P2P Networks

Matthias Bender; Sebastian Michel; Sebastian Parkitny; Gerhard Weikum

Third Edition.- Galois Connections, T-CUBES, and P2P Data Mining.- Querying a Super-Peer in a Schema-Based Super-Peer Network.- Query Answering and Overlay Communities.- Database Selection and Result Merging in P2P Web Search.- Multiple Dynamic Overlay Communities and Inter-space Routing.- Benefit and Cost of Query Answering in PDMS.- Indexing, Caching and Replication Techniques.- Cooperative Prefetching Strategies for Mobile Peers in a Broadcast Environment.- Symmetric Replication for Structured Peer-to-Peer Systems.- A Gradient Topology for Master-Slave Replication in Peer-to-Peer Environments.- Complex Query Processing and Routing.- A Content-Addressable Network for Similarity Search in Metric Spaces.- Range Query Optimization Leveraging Peer Heterogeneity in DHT Data Networks.- Guaranteeing Correctness of Lock-Free Range Queries over P2P Data.- Publish/Subscribe with RDF Data over Large Structured Overlay Networks.- Semantic Overlay Networks.- A Semantic Information Retrieval Advertisement and Policy Based System for a P2P Network.- Cumulative Algebraic Signatures for Fast String Search, Protection Against Incidental Viewing and Corruption of Data in an SDDS.- PARIS: A Peer-to-Peer Architecture for Large-Scale Semantic Data Integration.- Processing Rank-Aware Queries in P2P Systems.- Semantic Caching in Schema-Based P2P-Networks.- Aggregation of a Term Vocabulary for P2P-IR: A DHT Stress Test.- Services, Agents and Communities of Interest.- Peer Group-Based Dependency Management in Service-Oriented Peer-to-Peer Architectures.- LEAP-DB: A Mobile-Agent-Based Distributed DBMS Not Only for PDAs.- Models and Languages for Overlay Networks.- A Peer-to-Peer Membership Notification Service.- Querying Communities of Interest in Peer Database Networks.- Fourth Edition.- Middleware for Reliable Real-Time Sensor Data Management.- Data Placement and Searching.- Oscar: Small-World Overlay for Realistic Key Distributions.- Keyword Searching in Structured Overlays Via Content Distance Addressing.- Semantic Search.- XML Query Routing in Structured P2P Systems.- Reusing Classical Query Rewriting in P2P Databases.- Efficient Searching and Retrieval of Documents in PROSA.- P2P Query Reformulation over Both-As-View Data Transformation Rules.- RDFCube: A P2P-Based Three-Dimensional Index for Structural Joins on Distributed Triple Stores.- Query Processing and Workload Balancing.- Optimal Caching for First-Order Query Load-Balancing in Decentralized Index Structures.- On Triple Dissemination, Forward-Chaining, and Load Balancing in DHT Based RDF Stores.- Priority Based Load Balancing in a Self-interested P2P Network.- A Self-organized P2P Network for an Efficient and Secure Content Location and Download.- Query Coordination for Distributed Data Sharing in P2P Networks.- Continuous Queries and P2P Computing.- A Comparative Study of Pub/Sub Methods in Structured P2P Networks.- Answering Constrained k-NN Queries in Unstructured P2P Systems.- Scalable IPv4/IPv6 Transition: A Peer-to-Peer Based Approach.

international conference on move to meaningful internet systems | 2005

On the usage of global document occurrences in peer-to-peer information systems

Odysseas Papapetrou; Sebastian Michel; Matthias Bender; Gerhard Weikum

There exist a number of approaches for query processing in Peer-to-Peer information systems that efficiently retrieve relevant information from distributed peers. However, very few of them take into consideration the overlap between peers: as the most popular resources (e.g., documents or files) are often present at most of the peers, a large fraction of the documents eventually received by the query initiator are duplicates. We develop a technique based on the notion of global document occurrences (GDO) that, when processing a query, penalizes frequent documents increasingly as more and more peers contribute their local results. We argue that the additional effort to create and maintain the GDO information is reasonably low, as the necessary information can be piggybacked onto the existing communication. Early experiments indicate that our approach significantly decreases the number of peers that have to be involved in a query to reach a certain level of recall and, thus, decreases user-perceived latency and the wastage of network resources.

databases information systems and peer to peer computing | 2005

A comparative study of pub/sub methods in structured P2P networks

Matthias Bender; Sebastian Michel; Sebastian Parkitny; Gerhard Weikum

Methods for publish/subscribe applications over P2P networks have been a research issue for a long time. Many approaches have been developed and evaluated, but typically each based on different assumptions, which makes their mutual comparison very difficult if not impossible. We identify two design patterns that can be used to implement publish/subscribe applications over structured P2P networks and provide an analytical analysis of their complexity. Based on a characterization of different real-world usage scenarios we present evidence as to which approach is preferable for certain application classes. Finally, we present simulation results that support our analysis.

international conference on data engineering | 2006

P2P Directories for Distributed Web Search: From Each According to His Ability, to Each According to His Needs

Matthias Bender; Sebastian Michel; Gerhard Weikum

A compelling application of peer-to-peer (P2P) system technology would be distributed Web search, where each peer autonomously runs a search engine on a personalized local corpus (e.g., built from a thematically focused Web crawl) and peers collaborate by routing queries to remote peers that can contribute many or particularly good results for these specific queries. Such systems typically rely on a decentralized directory, e.g., built on top of a distributed hash table (DHT), that holds compact, aggregated statistical metadata about the peers which is used to identify promising peers for a particular query. To support an a-priori unlimited number of peers, it is crucial to keep the load on the distributed directory low. Moreover, each peer should ideally tailor its postings to the directory to reflect its particular strengths, such as rich information about specialized topics that no or only few other peers would also cover. This paper addresses this problem by proposing strategies for peers that identify suitable subsets of the most beneficial statistical metadata. We argue that posting a carefully selected subset of metadata can achieve almost the same result quality as a complete metadata directory, for only the most relevant peers are eventually involved in the execution of a given query. Additionally, asking only relevant peers will result in higher precision, as the noise introduced by poor peers is reduced. We have implemented these strategies in our fully operational P2P Web search prototype Minerva, and present experimental results on real-world Web data that show the viability of the strategies and their gains in terms of high search result quality at low networking costs.

Explore More