Armin Roth
Humboldt University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Armin Roth.
BTW | 2005
Ralf Heese; Sven Herschel; Felix Naumann; Armin Roth
Peer data management systems (PDMS) are the natural extension of integrated information systems. Conventionally, a single integrating system manages an integrated schema, distributes queries to appropriate sources, and integrates incoming data to a common result. In contrast, a PDMS consists of a set of peers, each of which can play the role of an integrating component. A peer knows about its neighboring peers by mappings, which help to translate queries and transform data. Queries submitted to one peer are answered by data residing at that peer and by data that is reached along paths of mappings through the network of peers. The only restriction for PDMS to cover unbounded data is the need to formulate at least one mapping from some known peer to a new data source. We propose a Semantic Web based method that overcomes this restriction, albeit at a price. As sources are dynamically and automatically included in a PDMS, three factors diminish quality: The new source itself might store data of poor quality, the mapping to the PDMS might be incorrect, and the mapping to the PDMS might be incomplete. To compensate, we propose a quality model to measure this effect, a cost model to restrict query planning to the best paths through the PDMS, and techniques to answer queries in such Webscale PDMS efficiently. 1 An Ever-growing PDMS The step from centralized database systems (DBMS) to distributed and then to federated database systems (FDBMS) removed the assumption that data must be located at the same site as the query. A federated database provides a global schema that represents the data it can access locally and remotely. The global schema is related to the local schemata via schema mappings, which specify how the schema of a local database maps to the global schema. The federated database accepts a query against its global schema and distributes it according to the schema mappings to the different sites where the data resides. Those sites execute the partial queries and send results back to the requesting peer. Again, the schema mappings specify how data is to be translated to conform to the global schema. The results are further processed and combined to be finally fused into a single response to the user. A natural extension to this paradigm is to remove the assumption that queries are only asked against a single integrating site. Peer data management systems (PDMS) are built of multiple peers, each of which provides a schema and accepts queries against the schema. Again, the peers are connected by mappings among their schemata. However, instead of forming a tree with a single root, each peer can be connected to any number of other peers. Queries against a schema of one peer can be answered using the data of the entire PDMS, as long as appropriate mappings have been formed (see Fig. 1). In general, a query
Information Systems | 2008
Katja Hose; Armin Roth; André Zeitz; Kai-Uwe Sattler; Felix Naumann
Peer Data Management Systems (Pdms) are a novel, useful, but challenging paradigm for distributed data management and query processing. Conventional integrated information systems have a hierarchical structure with an integration component that manages a global schema and distributes queries against this schema to the underlying data sources. Pdmsare a natural extension to this architecture by allowing each participating system (peer) to act both as a data source and as an integrator. Peers are interconnected by schema mappings, which guide the rewriting of queries between the heterogeneous schemas, and thus form a P2P (peer-to-peer)-like network. Despite several years of research, the development of efficient Pdmsstill holds many challenges. In this article we first survey the state of the art on peer data management: We classify Pdmsby characteristics concerning their system model, their semantics, their query planning schemes, and their maintenance. Then we systematically examine open research directions in each of those areas. In particular, we observe that research results from both the domain of P2P systems and of conventional distributed data management can have an impact on the development of Pdms.
databases information systems and peer to peer computing | 2005
Armin Roth; Felix Naumann
Peer data management systems (PDMS) are a natural extension to integrated information systems. They consist of a dynamic set of autonomous peers, each of which can mediate between heterogenous schemas of other peers. A new data source joins a PDMS by defining a semantic mapping to one or more other peers, thus forming a network of peers. Queries submitted to a peer are answered with data residing at that peer and by data that is reached along paths of mappings through the network of peers. However, without optimization methods query reformulation in PDMS is very inefficient due to redundancy in mapping paths. We present a decentral strategy that guides peers in their decision along which further mappings the query should be sent. The strategy uses statistics of the peers own data and statistics of mappings to neighboring peers to predict whether it is worthwhile to send the query to that neighbor-- or whether the query plan should be pruned at this point. These decisions are guided by a benefit and cost model, trading off the amount of data a neighbor will pass back, and the execution cost of that step. Thus, we allow a high scale-up of PDMS in the number of participating peers.
Data Exchange, Information, and Streams | 2013
Armin Roth; Sebastian Skritek
Peer Data Management (PDM) deals with the management of structured data in unstructured peer-to-peer (P2P) networks. Each peer can store data locally and define relationships between its data and the data provided by other peers. Queries posed to any of the peers are then answered by also considering the information implied by those mappings. The overall goal of PDM is to provide semantically well-founded integration and exchange of heterogeneous and distributed data sources. Unlike traditional data integration systems, peer data management systems (PDMSs) thereby allow for full autonomy of each member and need no central coordinator. The promise of such systems is to provide flexible data integration and exchange at low setup and maintenance costs. However, building such systems raises many challenges. Beside the obvious scalability problem, choosing an appropriate semantics that can deal with arbitrary, even cyclic topologies, data inconsistencies, or updates while at the same time allowing for tractable reasoning has been an area of active research in the last decade. In this survey we provide an overview of the different approaches suggested in the literature to tackle these problems, focusing on appropriate semantics for query answering and data exchange rather than on implementation specific problems.
Archive | 2012
Armin Roth
Peer data management systems (Pdms) consist of a highly dynamic set of autonomous, heterogeneous peers connected with schema mappings. Queries submitted at a peer are answered with data residing at that peer and by passing the queries to neighboring peers. Pdms are the most general architecture for distributed integrated information systems. With no need for central coordination, Pdms are highly flexible. However, due to the typical massive redundancy in mapping paths, Pdms tend to be very inefficient in computing the complete query result as the number of peers increases. Additionally, information loss is cumulated along mapping paths due to selections and projections in the mappings. Users usually accept concessions on the completeness of query answers in largescale data sharing settings. Our approach turns completeness into an optimization goal and thus trades off benefit and cost of query answering. To this end, we propose several strategies that guide peers in their decision to which neighbors rewritten queries should be sent. In effect, the peers prune mappings that are expected to contribute few data. We propose a query optimization strategy that limits resource consumption and show that it can drastically increase efficiency while still yielding satisfying completeness of the query result. To estimate the potential data contribution of mappings, we adopted self-tuning histograms for cardinality estimation. We developed techniques that ensure sufficient query feedback to adapt these statistics to massive changes in a Pdms. Additionally, histograms can serve to maintain statistics on data overlap between alternative mapping paths. Building on them, redundant query processing is reduced by avoiding overlapping areas of the multi-dimensional data space.
GI Jahrestagung (1) | 2004
Armin Roth; Felix Naumann
Integrierte Informationssysteme basieren meist auf einem globalen Schema, dessen Bildung und Wartung aufw ändig ist. Praktiker bevorzugen jedoch den direkten Datenaustausch zwischen etablierten Systemen. Diese Anforderungen adressieren Peer-basierte Datenmanagementsysteme (PDMS) in dynamischer und skalierbarer Weise. Anstelle eines globalen Schemas und Schema-Abbildungen zwischen globalem und lokalen Schemata sind Peers untereinander durch Schema-Abbildungen verbunden,über die Anfragen und Daten transformiert und weitergeleitet werden. Solche Abbildungspfade f̈ uhren allerdings meist zu einem Informationsverlust und vermindern die Qualiẗat der Anfrageergebnisse. Die naive Nutzung s ämtlicher vorhandener Abbildungspfade ist ausserdem ineffizient. Wir schlagen f ür PDMS die Ber̈ ucksichtigung der Informationsqualiẗ at bez̈uglich Datenquellen, Schema-Abbildungen und Anfrageergebnissen vor und nutzen Konzessionen an die Vollst ändigkeit zur Verminderung der Antwortzeit. Das Ziel ist ein Optimum zwischen Laufzeit der Anfrage und Qualit ät der Ergebnisse. Wir illustrieren dies anhand eines konkreten Anwendungsbeispiels. 1 Peer-basierte Datenmanagementsysteme (PDMS) Der Austausch und die Integration semantisch relevanter Information ist in der heutigen hochdynamischen und komplexen Welt ein dr ängendes Problem. Dabei liegt die Hauptmotivation der Informationsintegration in einer m öglichst umfassenden, also vollst ändigen Sicht der Welt. Dies erfordert die Einbeziehung m öglichst vieler relevanter, aber oft heterogener Datenquellen. Zentralisierte Datenintegrationssysteme (z.B. Data Warehouses) verfolgen die Idee, diesen Anforderungen mit einem globalen, integrierten Schema gerecht zu werden. Der hohe Aufwand zu dessen Bildung und Wartung ist jedoch ein wesentliches Hemmnis f̈ ur die Skalierbarkeit bez̈ uglich der Anzahl von Quellsystemen. In der Praxis ist zu beobachten, dass zum Datenaustausch eher ein dezentrales Vorgehen bevorzugt wird. Anfragen sollten im gewohnten Kontext des eigenen Schemas gestellt und über Beziehungen zü ahnlichen benachbarten Systemen bearbeitet werden. Diese Anforderungen adressieren Peer-basierte Datenmanagementsysteme (PDMS). In einem PDMS kann ein Peer sowohl Daten bereitstellen als auch die Rolle eines Mediators einnehmen und Anfragen entgegennehmen. Anfragen werden entsprechend semantischer Beziehungen, sogenannte Mappings, zwischen Peers übersetzt und weitergeleitet (Abbildung 1). Peer-to-Peer-Systeme zum Datenaustausch (P2P), wie z.B. Napster, besitzen im Gegensatz
WIIW | 2006
Armin Roth; Felix Naumann; Martin Schweigert
BTW | 2007
Armin Roth; Felix Naumann
Archive | 2006
Armin Roth; Felix Naumann; F. Naumann; Dasi; A. Roth
NETB'07 Proceedings of the 3rd USENIX international workshop on Networking meets databases | 2007
Alexander Albrecht; Felix Naumann; Armin Roth