Anastasios Kementsietsidis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anastasios Kementsietsidis is active.

Explore More

Publication

Featured researches published by Anastasios Kementsietsidis.

international conference on management of data | 2003

Mapping data in peer-to-peer systems: semantics and algorithmic issues

Anastasios Kementsietsidis; Marcelo Arenas; Renée J. Miller

We consider the problem of mapping data in peer-to-peer data-sharing systems. Such systems often rely on the use of mapping tables listing pairs of corresponding values to search for data residing in different peers. In this paper, we address semantic and algorithmic issues related to the use of mapping tables. We begin by arguing why mapping tables are appropriate for data mapping in a peer-to-peer environment. We discuss alternative semantics for these tables and we present a language that allows the user to specify mapping tables under different semantics. Then, we show that by treating mapping tables as constraints (called mapping constraints) on the exchange of information between peers it is possible to reason about them. We motivate why reasoning capabilities are needed to manage mapping tables and show the importance of inferring new mapping tables from existing ones. We study the complexity of this problem and we propose an efficient algorithm for its solution. Finally, we present an implementation along with experimental results that show that mapping tables may be managed efficiently in practice.

international conference on data engineering | 2007

Conditional Functional Dependencies for Data Cleaning

Philip Bohannon; Wenfei Fan; Floris Geerts; Xibei Jia; Anastasios Kementsietsidis

We propose a class of constraints, referred to as conditional functional dependencies (CFDs), and study their applications in data cleaning. In contrast to traditional functional dependencies (FDs) that were developed mainly for schema design, CFDs aim at capturing the consistency of data by incorporating bindings of semantic ally related values. For CFDs we provide an inference system analogous to Armstrongs axioms for FDs, as well as consistency analysis. Since CFDs allow data bindings, a large number of individual constraints may hold on a table, complicating detection of constraint violations. We develop techniques for detecting CFD violations in SQL as well as novel techniques for checking multiple constraints in a single query. We experimentally evaluate the performance of our CFD-based methods for inconsistency detection. This not only yields a constraint theory for CFDs but is also a step toward a practical constraint-based method for improving data quality.

international conference on management of data | 2003

The hyperion project: from data integration to data coordination

Marcelo Arenas; Vasiliki Kantere; Anastasios Kementsietsidis; Iluju Kiringa; Renée J. Miller; John Mylopoulos

We present an architecture and a set of challenges for peer database management systems. These systems team up to build a network of nodes (peers) that coordinate at run time most of the typical DBMS tasks such as the querying, updating, and sharing of data. Such a network works in a way similar to conventional multidatabases. Conventional multidatabase systems are founded on key concepts such as those of a global schema, central administrative authority, data integration, global access to multiple databases, permanent participation of databases, etc. Instead, our proposal assumes total absence of any central authority or control, no global schema, transient participation of peer databases, and constantly evolving coordination rules among databases. In this work, we describe the status of the Hyperion project, present our current solutions, and outline remaining research issues.

international conference on data engineering | 2006

MONDRIAN: Annotating and Querying Databases through Colors and Blocks

Floris Geerts; Anastasios Kementsietsidis; Diego Milano

Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data are only loosely coupled. This paper introduces an annotation-oriented data model for the manipulation and querying of both data and annotations. In particular, the model allows for the specification of annotations on sets of values and for effectively querying the information on their association. We use the concept of block to represent an annotated set of values. Different colors applied to the blocks represent different annotations. We introduce a color query language for our model and prove it to be both complete (it can express all possible queries over the class of annotated databases), and minimal (all the algebra operators are primitive). We present MONDRIAN, a prototype implementation of our annotation mechanism, and we conduct experiments that investigate the set of parameters which influence the evaluation cost for color queries.

international conference on data engineering | 2007

Rewriting Regular XPath Queries on XML Views

Wenfei Fan; Floris Geerts; Xibei Jia; Anastasios Kementsietsidis

We study the problem of answering queries posed on virtual views of XML documents, a problem commonly encountered when enforcing XML access control and integrating data. We approach the problem by rewriting queries on views into equivalent queries on the underlying document, and thus avoid the overhead of view materialization and maintenance. We consider possibly recursively defined XML views and study the rewriting of both XPath and regular XPath queries. We show that while rewriting is not always possible for XPath over recursive views, it is for regular XPath; however, the rewritten query may be of exponential size. To avoid this prohibitive cost we propose a rewriting algorithm that characterizes rewritten queries as a new form of automata, and an efficient algorithm to evaluate the automaton-represented queries. These allow us to answer queries on views in linear time. We have fully implemented a prototype system, SMOQE, which yields the first regular XPath engine and a practical solution for answering queries over possibly recursively defined XML views.

international conference on management of data | 2007

Distributed query evaluation with performance guarantees

Gao Cong; Wenfei Fan; Anastasios Kementsietsidis

Partial evaluation has recently proven an effective technique for evaluating Boolean XPath queries over a fragmented tree that is distributed over a number of sites. What left open is whether or not the technique is applicable to generic data-selecting XPath queries. In contrast to Boolean queries that return a single truth value, a generic XPath query returns a set of elements, and its evaluation introduces difficulties to avoiding excessive data shipping. This paper settles this question in positive by providing evaluation algorithms and optimizations for generic XPath queries in the same distributed and fragmented setting. These algorithms explore parallelism and retain the performance guarantees of their counterpart for Boolean queries, regardless of how the tree is fragmented and distributed. First, each site is visited at most three times, and down to at most twice when optimizations are in place. Second, the network traffic is determined by the final answer of the query, rather than the size of the tree, without incurring unnecessary data shipping. Third, the total computation is comparable to that of centralized algorithms on the tree stored in a single site. We show both analytically and experimentally that our algorithms and optimizations are scalable and efficient on large trees and complex XPath queries.

very large data bases | 2004

Data sharing through query translation in autonomous sources

Anastasios Kementsietsidis; Marcelo Arenas

We consider the problem of data sharing between autonomous data sources in an environment where constraints cannot be placed on the shared contents of sources. Our solutions rely on the use of mapping tables which define how data from different sources are associated. In this setting, the answer to a local query, that is, a query posed against the schema of a single source, is augmented by retrieving related data from associated sources. This retrieval of data is achieved by translating, through mapping tables, the local query into a set of queries that are executed against the associated sources. We consider both sound translations (which only retrieve correct answers) and complete translations (which retrieve all correct answers, and no incorrect answers) and we present algorithms to compute such translations. Our solutions are implemented and tested experimentally and we describe here our key findings.

databases information systems and peer to peer computing | 2003

Coordinating Peer Databases Using ECA Rules

Vasiliki Kantere; Iluju Kiringa; John Mylopoulos; Anastasios Kementsietsidis; Marcelo Arenas

Peer databases are stand-alone, independently developed databases that are linked to each other through acquaintances. They each contain local data, a set of mapping tables and expressions, and a set of ECA rules that are used to exchange data among them. The set of acquaintances and peers constitutes a dynamic peer-to-peer network in which acquaintances are continuously established and abolished. We present techniques for specifying data exchange policies on-the-fly based on constraints imposed on the way in which peers exchange and share data. We realize the on-the-fly specification of data exchange policies by building coordination ECA rules at acquaintance time. Finally, we describe mechanisms related to establishing and abolishing acquaintances by means of examples. Specifically, we consider syntactical constructs and executional semantics of establishing and abolishing acquaintances.

international conference on data engineering | 2003

Managing data mappings in the Hyperion project

Anastasios Kementsietsidis; Marcelo Arenas; Renée J. Miller

We consider the problem of mapping data in peer-to-peer systems. Such systems rely on simple value searches to locate data of interest. However, different peers may use different values to identify or describe the same data. To accommodate this, peer-to-peer systems often rely on mapping tables that list pairs of corresponding values for search domains that are used in different peers. We illustrate how such tables are used in the genomics community by expert curators. We then argue why mapping tables are appropriate for data mapping in a peer-to-peer environment and motivate the problem of managing these tables. The work presented is part of the Hyperion project.

extending database technology | 2006

i MONDRIAN: a visual tool to annotate and query scientific databases

Floris Geerts; Anastasios Kementsietsidis; Diego Milano

We demonstrate iMONDRIAN, a component of the MONDRIAN annotation management system. Distinguishing features of MONDRIAN are (i) the ability to annotate sets of values (ii) the annotation-aware query algebra. On top of that, iMONDRIAN offers an intuitive visual interface to annotate and query scientific databases. In this demonstration, we consider Gene Ontology (GO), a publicly available biological database. Using this database we show (i) the creation of annotations through the visual interface (ii) the ability to visually build complex, annotation-aware, queries (iii) the basic functionality for tracking annotation provenance. Our demonstration also provides a cheat window which shows the system internals and how visual queries are translated to annotation-aware algebra queries.

Explore More