Georgia Koloniari | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georgia Koloniari is active.

Explore More

Publication

Featured researches published by Georgia Koloniari.

extending database technology | 2004

Content-Based Routing of Path Queries in Peer-to-Peer Systems

Georgia Koloniari; Evaggelia Pitoura

Peer-to-peer (P2P) systems are gaining increasing popularity as a scalable means to share data among a large number of autonomous nodes. In this paper, we consider the case in which the nodes in a P2P system store XML documents. We propose a fully decentralized approach to the problem of routing path queries among the nodes of a P2P system based on maintaining specialized data structures, called filters that efficiently summarize the content, i.e., the documents, of one or more node. Our proposed filters, called multi-level Bloom filters, are based on extending Bloom filters so that they maintain information about the structure of the documents. In addition, we advocate building a hierarchical organization of nodes by clustering together nodes with similar content. Similarity between nodes is related to the similarity between the corresponding filters. We also present an efficient method for update propagation. Our experimental results show that multi-level Bloom filters outperform the classical Bloom filters in routing path queries. Furthermore, the content-based hierarchical grouping of nodes increases recall, that is, the number of documents that are retrieved.

international conference on management of data | 2005

Peer-to-peer management of XML data: issues and research challenges

Georgia Koloniari; Evaggelia Pitoura

Peer-to-peer (p2p) systems are attracting increasing attention as an efficient means of sharing data among large, diverse and dynamic sets of users. The widespread use of XML as a standard for representing and exchanging data in the Internet suggests using XML for describing data shared in a p2p system. However, sharing XML data imposes new challenges in p2p systems related to supporting advanced querying beyond simple keyword-based retrieval. In this paper, we focus on data management issues for processing XML data in a p2p setting, namely indexing, replication, clustering and query routing and processing. For each of these topics, we present the issues that arise, survey related research and highlight open research problems.

databases information systems and peer to peer computing | 2004

On using histograms as routing indexes in peer-to-peer systems

Yannis Petrakis; Georgia Koloniari; Evaggelia Pitoura

Peer-to-peer systems offer an efficient means for sharing data among autonomous nodes. A central issue is locating the nodes with data matching a user query. A decentralized solution to this problem is based on using routing indexes which are data structures that describe the content of neighboring nodes. Each node uses its routing index to route a query towards those of its neighbors that provide the largest number of results. We consider using histograms as routing indexes. We describe a decentralized procedure for clustering similar nodes based on histograms. Similarity between nodes is defined based on the set of queries they match and related with the distance between their histograms. Our experimental results show that using histograms to cluster similar nodes and to route queries increases the number of results returned for a given number of nodes visited.

databases information systems and peer to peer computing | 2003

Content-Based Overlay Networks for XML Peers Based on Multi-level Bloom Filters

Georgia Koloniari; Yannis Petrakis; Evaggelia Pitoura

Peer-to-peer systems are gaining popularity as a means to effectively share huge, massively distributed data collections. In this paper, we consider XML peers, that is, peers that store XML documents. We show how an extension of traditional Bloom filters, called multi-level Bloom filters, can be used to route path queries in such a system. In addition, we propose building content-based overlay networks by linking together peers with similar content. The similarity of the content (i.e., the local documents) of two peers is defined based on the similarity of their filters. Our experimental results show that overlay networks built based on filter similarity are very effective in retrieving a large number of relevant documents, since peers with similar content tend to be clustered together.

The Computer Journal | 2004

Filters for XML-based Service Discovery in Pervasive Computing

Georgia Koloniari; Evaggelia Pitoura

Pervasive computing refers to an emerging trend towards numerous casually accessible devices connected to an increasingly ubiquitous network infrastructure. An important challenge in this context is discovering the appropriate data and services. In this paper, we assume that services and data are described using hierarchically structured metadata. There is no centralized index for the services; instead, appropriately distributed filters are used to route queries to the appropriate nodes. We propose two new types of filter that extend Bloom filters for hierarchical documents. Two alternative ways are considered for building overlay networks of nodes: one based on network proximity and one based on content similarity. Content similarity is derived from the similarity among filters. Our experimental results show that networks based on content similarity outperform those formed based on network proximity for finding all matching documents.

IEEE Transactions on Parallel and Distributed Systems | 2012

A Game-Theoretic Approach to the Formation of Clustered Overlay Networks

Georgia Koloniari; Evaggelia Pitoura

In many large-scale content sharing applications, participants or nodes are connected with each other based on their content or interests, thus forming clusters. In this paper, we model the formation of such clustered overlays as a strategic game, where nodes determine their cluster membership with the goal of improving the recall of their queries. We study the evolution of such overlays both theoretically and experimentally in terms of stability, optimality, load balance, and the required overhead. We show that, in general, decisions made independently by each node using only local information lead to overall cost-effective cluster configurations that are also dynamically adaptable to system updates such as churn and query or content changes.

First International Workshop on Graph Data Management Experiences and Systems | 2013

Partial view selection for evolving social graphs

Georgia Koloniari; Evaggelia Pitoura

In this paper, we deal with the problem of historical query evaluation over evolving social graphs. Historical queries are queries about the social graph in the past. The straightforward way of executing such a query is by first reconstructing the whole social graph at the given time instance or interval, and then, evaluating the query on the reconstructed graph. Since social graphs are large, the cost of a complete graph snapshot reconstruction would dominate the cost of historical query execution. Given that many queries are user-centric, i.e., node-centric queries that require access only of a targeted subgraph, we propose deploying partial view instead of full snapshot construction and define conditions that determine when a partial view can be used to evaluate a query. We also propose using a cache of partial views to further reduce the query evaluation cost, and show how partial views can be extended to new views with reduced cost. Finally, we present a greedy solution for the static view selection problem and study its performance experimentally.

international conference on data engineering | 2008

Recall-based cluster reformulation by selfish peers

Georgia Koloniari; Evaggelia Pitoura

Recently, clustered overlays in which peers are grouped based on the similarity of their content or interests have been proposed to improve performance in peer-to-peer systems. Since such systems are highly dynamic, the overlay network needs to be updated frequently to cope with changes. In this paper, we introduce an approach for updating a clustered overlay based on local decisions made by individual peers. We model the cluster-reformulation problem as a game where peers determine their cluster membership based on potential gains in the recall of their queries. We also define global criteria for the overall quality of the system and propose strategies for peer relocation that consider different behavioral patterns for the peers. Our preliminary experimental evaluation shows that our strategies cope well with changes in the overlay network.

conference on information and knowledge management | 2011

One is enough: distributed filtering for duplicate elimination

Georgia Koloniari; Nikos Ntarmos; Evaggelia Pitoura; Dimitris Souravlias

The growth of online services has created the need for duplicate elimination in high-volume streams of events. The sheer volume of data in applications such as pay-per-click clickstream processing, RSS feed syndication and notification services in social sites such Twitter and Facebook makes traditional centralized solutions hard to scale. In this paper, we propose an approach based on distributed filtering. To this end, we introduce a suite of distributed Bloom filters that exploit different ways of partitioning the event space. To address the continuous nature of event delivery, the filters are extended to support sliding window semantics. Moreover, we examine locality-related tradeoffs and propose a tree-based architecture to allow for duplicate elimination across geographic locations. We cast the design space and present experimental results that demonstrate the pros and cons of our various solutions in different settings.

knowledge discovery and data mining | 2015

Scalable Blocking for Privacy Preserving Record Linkage

Alexandros Karakasidis; Georgia Koloniari; Vassilios S. Verykios

When dealing with sensitive and personal user data, the process of record linkage raises privacy issues. Thus, privacy preserving record linkage has emerged with the goal of identifying matching records across multiple data sources while preserving the privacy of the individuals they describe. The task is very resource demanding, considering the abundance of available data, which, in addition, are often dirty. Blocking techniques are deployed prior to matching to prune out unlikely to match candidate records so as to reduce processing time. However, when scaling to large datasets, such methods often result in quality loss. To this end, we propose Multi-Sampling Transitive Closure for Encrypted Fields (MS-TCEF), a novel privacy preserving blocking technique based on the use of reference sets. Our new method effectively prunes records based on redundant assignments to blocks, providing better fault-tolerance and maintaining result quality while scaling linearly with respect to the dataset size. We provide a theoretical analysis on the methods complexity and show how it outperforms state-of-the-art privacy preserving blocking techniques with respect to both recall and processing cost.

Explore More