Wolf Siberski
Leibniz University of Hanover
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wolf Siberski.
international world wide web conferences | 2003
Wolfgang Nejdl; Martin Wolpers; Wolf Siberski; Christoph Schmitz; Mario T. Schlosser; Ingo Brunkhorst; Alexander Löser
RDF-based P2P networks have a number of advantages compared with simpler P2P networks such as Napster, Gnutella or with approaches based on distributed indices such as CAN and CHORD. RDF-based P2P networks allow complex and extendable descriptions of resources instead of fixed and limited ones, and they provide complex query facilities against these metadata instead of simple keyword-based searches.In previous papers, we have described the Edutella infrastructure and different kinds of Edutella peers implementing such an RDF-based P2P network. In this paper we will discuss these RDF-based P2P networks as a specific example of a new type of P2P networks, schema-based P2P networks, and describe the use of super-peer based topologies for these networks. Super-peer based networks can provide better scalability than broadcast based networks, and do provide perfect support for inhomogeneous schema-based networks, which support different metadata schemas and ontologies (crucial for the Semantic Web). Furthermore, as we will show in this paper, they are able to support sophisticated routing and clustering strategies based on the metadata schemas, attributes and ontologies used. Especially helpful in this context is the RDF functionality to uniquely identify schemas, attributes and ontologies. The resulting routing indices can be built using dynamic frequency counting algorithms and support local mediation and transformation rules, and we will sketch some first ideas for implementing these advanced functionalities as well.
databases information systems and peer to peer computing | 2003
Alexander Löser; Felix Naumann; Wolf Siberski; Wolfgang Nejdl; Uwe Thaden
When joining information provider peers to a peer-to-peer network, an arbitrary distribution is sub-optimal. In fact, clustering peers by their characteristics, enhances search and integration significantly. Currently super-peer networks, such as the Edutella network, provide no sophisticated means for such a ”semantic clustering” of peers. We introduce the concept of semantic overlay clusters (SOC) for super-peer networks enabling a controlled distribution of peers to clusters. In contrast to the recently announced semantic overlay network approach designed for flat, pure peer-to-peer topologies and for limited meta data sets, such as simple filenames, we allow a clustering of complex heterogeneous schemes known from relational databases and use advantages of super-peer networks, such as efficient search and broadcast of messages. Our approach is based on predefined policies defined by human experts. Based on such policies a fully decentralized broadcast- and matching approach distributes the peers automatically to super-peers. Thus we are able to automate the integration of information sources in super-peer networks and reduce flooding of the network with messages.
Journal of Web Semantics | 2009
Gideon Zenz; Xuan Zhou; Enrico Minack; Wolf Siberski; Wolfgang Nejdl
Constructing semantic queries is a demanding task for human users, as it requires mastering a query language as well as the schema which has been used for storing the data. In this paper, we describe QUICK, a novel system for helping users to construct semantic queries in a given domain. QUICK combines the convenience of keyword search with the expressivity of semantic queries. Users start with a keyword query and then are guided through a process of incremental refinement steps to specify the query intention. We describe the overall design of QUICK, present the core algorithms to enable efficient query construction, and finally demonstrate the effectiveness of our system through an experimental study.
extending database technology | 2009
Sergej Zerr; Daniel Olmedilla; Wolfgang Nejdl; Wolf Siberski
Privacy-preserving document exchange among collaboration groups in an enterprise as well as across enterprises requires techniques for sharing and search of access-controlled information through largely untrusted servers. In these settings search systems need to provide confidentiality guarantees for shared information while offering IR properties comparable to the ordinary search engines. Top-k is a standard IR technique which enables fast query execution on very large indexes and makes systems highly scalable. However, indexing access-controlled information for top-k retrieval is a challenging task due to the sensitivity of the term statistics used for ranking. In this paper we present Zerber+R -- a ranking model which allows for privacy-preserving top-k retrieval from an outsourced inverted index. We propose a relevance score transformation function which makes relevance scores of different terms indistinguishable, such that even if stored on an untrusted server they do not reveal information about the indexed data. Experiments on two real-world data sets show that Zerber+R makes economical usage of bandwidth and offers retrieval properties comparable with an ordinary inverted index.
international conference on management of data | 2003
Wolfgang Nejdl; Wolf Siberski; Michael Sintek
Databases have employed a schema-based approach to store and retrieve structured data for decades. For peer-to-peer (P2P) networks, similar approaches are just beginning to emerge. While quite a few database techniques can be re-used in this new context, a P2P data management infrastructure poses additional challenges which have to be solved before schema-based P2P networks become as common as schema-based databases. We will describe some of these challenges and discuss approaches to solve them. Our discussion will be based on the design decisions we have employed in our Edutella infrastructure, a schema-based P2P network based on RDF and RDF schemas, and will also point out additional work addressing the issues discussed.
conference on advanced information systems engineering | 2003
Alexander Löser; Wolf Siberski; Martin Wolpers; Wolfgang Nejdl
Peer-to-peer (P2P) networks have become an important infrastructure during the last years. Using P2P networks for distributed information systems allows us to shift the focus from centrally organized to distributed information systems where all peers can provide and have access to information. In previous papers, we have described an RDF-based P2P infrastructure called Edutella which is a specific example of a more advanced approach to P2P networks called schema-based peer-to-peer networks. Schema-based P2P networks have a number of advantages compared with simpler P2P networks such as Napster or Gnutella. Instead of prescribing one global schema to describe content, they support arbitrary metadata schemas and ontologies (crucial for the Semantic Web). Thereby they allow complex and extendable descriptions of resources thus introducing dynamic behavior to the former fixed and limited descriptions, and can provide complex query facilities against these metadata instead of simple keyword-based searches. In this paper we will elaborate topologies, indices and query routing strategies for efficient query distribution in such networks. Our work is based on the concept of super-peer networks which provide better scalability compared to traditional P2P networks. By adapting existing concepts of mediator-based information systems to super-peer based networks, as we will showin this paper, they are able to support sophisticated routing, clustering and mediation strategies based on the metadata schemas and attributes. The resulting routing indices can be built using local clustering policies and support local mediation and transformation rules between heterogeneous schemas, and we sketch some first ideas for implementing these advanced functionalities as well.
Journal of Web Semantics | 2004
Wolfgang Nejdl; Martin Wolpers; Wolf Siberski; Christoph Schmitz; Mario T. Schlosser; Ingo Brunkhorst; Alexander Löser
RDF-based P2P networks have a number of advantages compared to simpler P2P networks such as Napster, Gnutella or to approaches based on distributed indices on binary keys such as CAN and CHORD. RDF-based P2P networks allow complex and extendable descriptions of resources instead of fixed and limited ones, and they provide complex query facilities against these metadata instead of simple keyword-based searches. In this paper we will discuss RDF-based P2P networks like Edutella as a specific example of a new type of P2P networks - schema-based P2P networks - and describe the use of super-peer based topologies for these networks. Super-peer based networks can provide better scalability than broadcast based networks, and provide support for inhomogeneous schema-based networks, with different metadata schemas and ontologies (crucial for the Semantic Web). Based on (dynamic) metadata routing indices, stated in RDF, the superpeer network supports sophisticated routing and distribution strategies, as well as preparing the ground for advanced mediation and clustering functionalities.
international acm sigir conference on research and development in information retrieval | 2011
Enrico Minack; Wolf Siberski; Wolfgang Nejdl
Result diversification is an effective method to reduce the risk that none of the returned results satisfies a users query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.
advanced information networking and applications | 2007
L. Michael; W. Nejd; Odysseas Papapetrou; Wolf Siberski
Bloom filter based algorithms have proven successful as very efficient technique to reduce communication costs of database joins in a distributed setting. However, the full potential of bloom filters has not yet been exploited. Especially in the case of multi-joins, where the data is distributed among several sites, additional optimization opportunities arise, which require new bloom filter operations and computations. In this paper, we present these extensions and point out how they improve the performance of such distributed joins. While the paper focuses on efficient join computation, the described extensions are applicable to a wide range of usages, where bloom filters are facilitated for compressed set representation.
Distributed and Parallel Databases | 2010
Odysseas Papapetrou; Wolf Siberski; Wolfgang Nejdl
Bloom filters are extensively used in distributed applications, especially in distributed databases and distributed information systems, to reduce network requirements and to increase performance. In this work, we propose two novel Bloom filter features that are important for distributed databases and information systems. First, we present a new approach to encode a Bloom filter such that its length can be adapted to the cardinality of the set it represents, with negligible overhead with respect to computation and false positive probability. The proposed encoding allows for significant network savings in distributed databases, as it enables the participating nodes to optimize the length of each Bloom filter before sending it over the network, for example, when executing Bloom joins. Second, we show how to estimate the number of distinct elements in a Bloom filter, for situations where the represented set is not materialized. These situations frequently arise in distributed databases, where estimating the cardinality of the represented sets is necessary for constructing an efficient query plan. The estimation is highly accurate and comes with tight probabilistic bounds. For both features we provide a thorough probabilistic analysis and extensive experimental evaluation which confirm the effectiveness of our approaches.