Theoni Pitoura | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Theoni Pitoura is active.

Explore More

Publication

Featured researches published by Theoni Pitoura.

extending database technology | 2006

Replication, load balancing and efficient range query processing in DHTs

Theoni Pitoura; Nikos Ntarmos; Peter Triantafillou

We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data networks maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since replication also introduces fault tolerance benefits). In this paper, we present HotRoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tunable data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concurrently addresses the two conflicting problems using data replication.

databases information systems and peer to peer computing | 2003

Towards a Unifying Framework for Complex Query Processing over Structured Peer-to-Peer Data Networks

Peter Triantafillou; Theoni Pitoura

In this work we study how to process complex queries in DHT-based Peer-to-Peer (P2P) data networks. Queries are made over tuples and relations and are expressed in a query language, such as SQL. We describe existing research approaches for query processing in P2P systems, we suggest improvements and enhancements, and propose a unifying framework that consists of a modified DHT architecture, data placement and search algorithms, and provides efficient support for processing a variety of query types, including queries with one or more attributes, queries with selection operators (involving equality and range queries), and queries with join operators. To our knowledge, this is the first work that puts forth a framework providing support for all these query types.

international conference on data engineering | 2007

Load Distribution Fairness in P2P Data Management Systems

Theoni Pitoura; Peter Triantafillou

We address the issue of measuring storage, or query load distribution fairness in peer-to-peer data management systems. Existing metrics may look promising from the point of view of specific peers, while in reality being far from optimal from a global perspective. Thus, first we define the requirements and study the appropriateness of various statistical metrics for measuring load distribution fairness towards these requirements. The metric proposed as most appropriate is the Gini coefficient (G). Second, we develop novel distributed sampling algorithms to compute G on-line, with high precision, efficiently, and scalably. Third, we show how G can readily be utilized on-line by higher-level algorithms which can now know when to best intervene to correct load imbalances. Our analysis and experiments testify for the efficiency and accuracy of these algorithms, permitting the online use of a rich and reliable metric, conveying a global perspective of the distribution.

IEEE Transactions on Knowledge and Data Engineering | 2012

Saturn: Range Queries, Load Balancing and Fault Tolerance in DHT Data Systems

Theoni Pitoura; Nikos Ntarmos; Peter Triantafillou

In this paper, we present Saturn, an overlay architecture for large-scale data networks maintained over Distributed Hash Tables (DHTs) that efficiently processes range queries and ensures access load balancing and fault-tolerance. Placing consecutive data values in neighboring peers is desirable in DHTs since it accelerates range query processing; however, such a placement is highly susceptible to load imbalances. At the same time, DHTs may be susceptible to node departures/failures and high data availability and fault tolerance are significant issues. Saturn deals effectively with these problems through the introduction of a novel multiple ring, order-preserving architecture. The use of a novel order-preserving hash function ensures fast range query processing. Replication across and within data rings (termed vertical and horizontal replication) forms the foundation over which our mechanisms are developed, ensuring query load balancing and fault tolerance, respectively. Our detailed experimentation study shows strong gains in range query processing efficiency, access load balancing, and fault tolerance, with low replication overheads. The significance of Saturn is not only that it effectively tackles all three issues together - i.e., supporting range queries, ensuring load balancing, and providing fault tolerance over DHTs - but also that it can be applied on top of any order-preserving DHT enabling it to dynamically handle replication and, thus, to trade off replication costs for fair load distribution and fault tolerance.

international conference on data engineering | 2008

Self-Join Size Estimation in Large-scale Distributed Data Systems

Theoni Pitoura; Peter Triantafillou

In this work we tackle the open problem of self-join size (SJS) estimation in a large-scale distributed data system, where tuples of a relation are distributed over data nodes which comprise an overlay network. Our contributions include adaptations of five well-known SJS estimation centralized techniques (coined sequential, cross-sampling, adaptive, bifocal, and sample-count) to the network environment and a novel technique which is based on the use of the Gini coefficient. We develop analyses showing how Gini estimations can lead to estimations of the underlying Zipfian or power-law value distributions. We further contribute distributed sampling algorithms that can estimate accurately and efficiently the Gini coefficient. Finally, we provide detailed experimental evidence testifying for the claimed increased accuracy, precision, and efficiency of the proposed SJS estimation method, compared to the other methods. The proposed approach is the only one to ensure high efficiency, precision, and accuracy regardless of the skew of the underlying data.

databases information systems and peer to peer computing | 2005

Range query optimization leveraging peer heterogeneity in DHT data networks

Nikos Ntarmos; Theoni Pitoura; Peter Triantafillou

In this work we address the issue of efficient processing of range queries in DHT-based P2P data networks. The novelty of the proposed approach lies on architectures, algorithms, and mechanisms for identifying and appropriately exploiting powerful nodes in such networks. The existence of such nodes has been well documented in the literature and plays a key role in the architecture of most successful real-world P2P applications. However, till now, this heterogeneity has not been taken into account when architecting solutions for complex query processing, especially in DHT networks. With this work we attempt to fill this gap for optimizing the processing of range queries. Significant performance improvements are achieved due to (i) ensuring a much smaller hop count performance for range queries, and (ii) avoiding the dangers and inefficiencies of relying for range query processing on weak nodes, with respect to processing, storage, and communication capacities, and with intermittent connectivity. We present detailed experimental results validating our performance claims.

ACM Transactions on Internet Technology | 2009

Distribution fairness in Internet-scale networks

Theoni Pitoura; Peter Triantafillou

We address the issue of measuring distribution fairness in Internet-scale networks. This problem has several interesting instances encountered in different applications, ranging from assessing the distribution of load between network nodes for load balancing purposes, to measuring node utilization for optimal resource exploitation, and to guiding autonomous decisions of nodes in networks built with market-based economic principles. Although some metrics have been proposed, particularly for assessing load balancing algorithms, they fall short. We first study the appropriateness of various known and previously proposed statistical metrics for measuring distribution fairness. We put forward a number of required characteristics for appropriate metrics. We propose and comparatively study the appropriateness of the Gini coefficient (G) for this task. Our study reveals as most appropriate the metrics of G, the fairness index (FI), and the coefficient of variation (CV) in this order. Second, we develop six distributed sampling algorithms to estimate metrics online efficiently, accurately, and scalably. One of these algorithms (2-PRWS) is based on two effective optimizations of a basic algorithm, and the other two (the sequential sampling algorithm, LBS-HL, and the clustered sampling one, EBSS) are novel, developed especially to estimate G. Third, we show how these metrics, and especially G, can be readily utilized online by higher-level algorithms, which can now know when to best intervene to correct unfair distributions (in particular, load imbalances). We conclude with a comprehensive experimentation which comparatively evaluates both the various proposed estimation algorithms and the three most appropriate metrics (G, CV, andFI). Specifically, the evaluation quantifies the efficiency (in terms of number of the messages and a latency indicator), precision, and accuracy achieved by the proposed algorithms when estimating the competing fairness metrics. The central conclusion is that the proposed metric, G, can be estimated with a small number of messages and latency, regardless of the skew of the underlying distribution.

Archive | 2005

Range query optimization leveraging peer heterogeneity

Nikos Ntarmos; Theoni Pitoura; Peter Triantafillou

Archive | 2004

The RangeGuard: Range query optimization in peer-to-peer data networks

Nikos Ntarmos; Theoni Pitoura; Peter Triantafillou

Archive | 2004

HotRoD: Load Balancing and Efficient Range Query Processing in Peer-to-Peer Data Networks

Theoni Pitoura; Nikos Ntarmos; Peter Triantafillou

Explore More

Collaboration

Dive into the Theoni Pitoura's collaboration.

Top Co-Authors

Peter Triantafillou

University of Glasgow

View shared research outputs

Top Co-Authors

Nikos Ntarmos

University of Patras

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Theoni Pitoura is active.

Publication

Featured researches published by Theoni Pitoura.

Replication, load balancing and efficient range query processing in DHTs

Towards a Unifying Framework for Complex Query Processing over Structured Peer-to-Peer Data Networks

Load Distribution Fairness in P2P Data Management Systems

Saturn: Range Queries, Load Balancing and Fault Tolerance in DHT Data Systems

Self-Join Size Estimation in Large-scale Distributed Data Systems

Range query optimization leveraging peer heterogeneity in DHT data networks

Distribution fairness in Internet-scale networks

Range query optimization leveraging peer heterogeneity

The RangeGuard: Range query optimization in peer-to-peer data networks

HotRoD: Load Balancing and Efficient Range Query Processing in Peer-to-Peer Data Networks

Collaboration

Dive into the Theoni Pitoura's collaboration.