Is this you? Create Your Porfile

Katerina Doka

National Technical University of Athens

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katerina Doka is active.

Explore More

Publication

Featured researches published by Katerina Doka.

computer and communications security | 2015

k-Anonymization by Freeform Generalization

Katerina Doka; Mingqiang Xue; Dimitrios Tsoumakos; Panagiotis Karras

Syntactic data anonymization strives to (i) ensure that an adversary cannot identify an individuals record from published attributes with high probability, and (ii) provide high data utility. These mutually conflicting goals can be expressed as an optimization problem with privacy as the constraint and utility as the objective function. Conventional research using the k-anonymity model has resorted to publishing data in homogeneous generalized groups. A recently proposed alternative does not create such cliques; instead, it recasts data values in a heterogeneous manner, aiming for higher utility. Nevertheless, such works never defined the problem in the most general terms; thus, the utility gains they achieve are limited. In this paper, we propose a methodology that achieves the full potential of heterogeneity and gains higher utility while providing the same privacy guarantee. We formulate the problem of maximal-utility k-anonymization by freeform generalization as a network flow problem. We develop an optimal solution therefor using Mixed Integer Programming. Given the non-scalability of this solution, we develop an O(k n2) Greedy algorithm that has no time-complexity disadvantage vis-á-vis previous approaches, an O(k n2 log n) enhanced version thereof, and an O(k n3) adaptation of the Hungarian algorithm; these algorithms build a set of k perfect matchings from original to anonymized data, a novel approach to the problem. Moreover, our techniques can resist adversaries who may know the employed algorithms. Our experiments with real-world data verify that our schemes achieve near-optimal utility (with gains of up to 41%), while they can exploit parallelism and data partitioning, gaining an efficiency advantage over simpler methods.

Journal of Parallel and Distributed Computing | 2011

Brown Dwarf: A fully-distributed, fault-tolerant data warehousing system

Katerina Doka; Dimitrios Tsoumakos; Nectarios Koziris

In this paper we present the Brown Dwarf, a distributed data analytics system designed to efficiently store, query and update multidimensional data over commodity network nodes, without the use of any proprietary tool. Brown Dwarf distributes a centralized indexing structure among peers on-the-fly, reducing cube creation and querying times by enforcing parallelization. Analytical queries are naturally performed on-line through cooperating nodes that form an unstructured Peer-to-Peer overlay. Updates are also performed on-line, eliminating the usually costly over-night process. Moreover, the system employs an adaptive replication scheme that adjusts to the workload skew as well as the network churn by expanding or shrinking the units of the distributed data structure. Our system has been thoroughly evaluated on an actual testbed: it manages to accelerate cube creation up and querying up to several tens of times compared to the centralized solution by exploiting the capabilities of the available network nodes working in parallel. It also manages to quickly adapt even after sudden bursts in load and remains unaffected with a considerable fraction of frequent node failures. These advantages are even more apparent for dense and skewed data cubes and workloads.

Journal of Parallel and Distributed Computing | 2011

Online querying of d-dimensional hierarchies

Katerina Doka; Dimitrios Tsoumakos; Nectarios Koziris

In this paper we describe a distributed system designed to efficiently store, query and update multidimensional data organized into concept hierarchies and dispersed over a network. Our system employs an adaptive scheme that automatically adjusts the level of indexing according to the granularity of the incoming queries, without assuming any prior knowledge of the workload. Efficient roll-up and drill-down operations take place in order to maximize the performance by minimizing query flooding. Updates are performed on-line, with minimal communication overhead, depending on the level of consistency needed. Extensive experimental evaluation shows that, on top of the advantages that a distributed storage offers, our method answers the vast majority of incoming queries, both point and aggregate ones, without flooding the network and without causing significant storage or load imbalance. Our scheme proves to be especially efficient in cases of skewed workloads, even when these change dynamically with time. At the same time, it manages to preserve the hierarchical nature of data. To the best of our knowledge, this is the first attempt towards the support of concept hierarchies in DHTs.

Future Generation Computer Systems | 2009

A grid middleware for data management exploiting peer-to-peer techniques

Athanasia Asiki; Katerina Doka; Ioannis Konstantinou; Antonis Zissimos; Dimitrios Tsoumakos; Nectarios Koziris; Panayiotis Tsanakas

In this paper, we describe a service-oriented middleware architecture for Grid environments which enables efficient data management. Our design introduces concepts from Peer-to-Peer computing in order to provide a scalable and reliable infrastructure for storage, search and retrieval of annotated content. To ensure fast file lookups in the distributed repositories, our system incorporates a multidimensional indexing scheme which serves the need for supporting both exact match and range queries over a group of metadata attributes. Finally, file transfers are conducted using GridTorrent, a grid-enabled, Peer-to-Peer mechanism that performs efficient data transfers by enabling cooperation among participating nodes and balances the cost of file transfer among them. The proposed architecture is the middleware component used by the GREDIA project, in which both media and banking partners plan to share large loads of annotated content.

international conference on big data | 2016

Mix ‘n’ match multi-engine analytics

Katerina Doka; Nikolaos Papailiou; Victor Giannakouris; Dimitrios Tsoumakos; Nectarios Koziris

Current platforms fail to efficiently cope with the data and task heterogeneity of modern analytics workflows due to their adhesion to a single data and/or compute model. As a remedy, we present IReS, the Intelligent Resource Scheduler for complex analytics workflows executed over multi-engine environments. IReS is able to optimize a workflow with respect to a user-defined policy relying on cost and performance models of the required tasks over the available platforms. This optimization consists in allocating distinct workflow parts to the most advantageous execution and/or storage engine among the available ones and deciding on the exact amount of resources provisioned. Our current prototype supports 5 compute and 3 data engines, yet new ones can effortlessly be added to IReS by virtue of its engine-agnostic mechanisms. Our extensive experimental evaluation confirms that IReS speeds up diverse and realistic workflows by up to 30% compared to their optimal single-engine plan by automatically scattering parts of them to different execution engines and datastores. Its optimizer incurs only marginal overhead to the workflow execution performance, managing to discover the optimal execution plan within a few seconds, even for large-scale workflow instances.

Future Internet | 2014

The ARCOMEM Architecture for Social- and Semantic-Driven Web Archiving

Thomas Risse; Elena Demidova; Stefan Dietze; Wim Peters; Nikolaos Papailiou; Katerina Doka; Yannis Stavrakas; Vassilis Plachouras; Pierre Senellart; Florent Carpentier; Amin Mantrach; Bogdan Cautis; Patrick Siehndel; Dimitris Spiliotopoulos

The constantly growing amount ofWeb content and the success of the SocialWeb lead to increasing needs for Web archiving. These needs go beyond the pure preservationo of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions and other events. Due to the size of the Web, the traditional “collect-all” strategy is in many cases not the best method to build Web archives. In this paper, we present the ARCOMEM (From Future Internet 2014, 6 689 Collect-All Archives to Community Memories) architecture and implementation that uses semantic information, such as entities, topics and events, complemented with information from the Social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts.

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud | 2010

Efficient updates for a shared nothing analytics platform

Katerina Doka; Dimitrios Tsoumakos; Nectarios Koziris

In this paper we describe a cloud-based data-warehouselike system especially targeted to time series data. Apart from the benefits that a distributed storage built on top of a shared-nothing architecture offers, our system is designed to efficiently cope with continuous, on-line updates of temporally ordered data without compromising the query throughput. Through a totally customizable process performing asynchronous aggregation of past records, we achieve significant gains in storage and update times compared to traditional methods, maintaining a high accuracy in query responses for our target application. Experiments using our prototype implementation over an actual testbed prove that our scheme considerably accelerates (by a factor above 3) the update procedure and reduces required storage by at least 30%. We also show how these gains are related to the level and rate of aggregation performed.

web information and data management | 2008

HiPPIS: an online P2P system for efficient lookups on d-dimensional hierarchies

Katerina Doka; Dimitrios Tsoumakos; Nectarios Koziris

In this paper we describe HiPPIS, a system that enables efficient storage and on-line querying of multidimensional data organized into concept hierarchies and dispersed over a network. Our scheme utilizes an adaptive algorithm that automatically adjusts the level of indexing according to the granularity of the incoming queries, without assuming any prior knowledge of the workload. Efficient roll-up and drill-down operations take place in order to maximize the performance by minimizing query flooding. Extensive experimental evaluations show that, on top of the advantages that a distributed storage offers, our method answers the large majority of incoming queries, both point and aggregate ones, without flooding the network. At the same time, it manages to preserve the hierarchical nature of data. These characteristics are maintained even after sudden shifts in the workload.

international conference on tools with artificial intelligence | 2015

An Equitable Solution to the Stable Marriage Problem

Ioannis Giannakopoulos; Panagiotis Karras; Dimitrios Tsoumakos; Katerina Doka; Nectarios Koziris

A stable marriage problem (SMP) of size n involves n men and n women, each of whom has ordered members of the opposite gender by descending preferability. A solution is a perfect matching among men and women, such that there exists no pair who prefer each other to their current spouses. The problem was formulated in 1962 by Gale and Shapley, who showed that any instance can be solved in polynomial time, and has attracted interest due to its application to any two-sided market. Still, the solution obtained by the Gale-Shapley algorithm is favorable to one side. Gusfield and Irving introduced the equitable stable marriage problem (ESMP), which calls for finding a stable matching that minimizes the distance between mens and womens sum-of-rankings of their spouses. Unfortunately, ESMP is strongly NP-hard, approximation algorithms therefor are impractical, while even proposed heuristics may run for an unpredictable number of iterations. We propose a novel, deterministic approach that treats both genders equally, while eschewing an exhaustive exploration of the space of all stable matchings. Our thorough experimental study shows that, in contrast to previous proposals, our method not only achieves high-quality solutions, but also terminates efficiently and repeatably on all tested large problem instances.

international conference on big data | 2014

MoDisSENSE: A distributed platform for social networking services over mobile devices

Ioannis Mytilinis; Ioannis Giannakopoulos; Ioannis Konstantinou; Katerina Doka; Nectarios Koziris

In this work we present MoDisSENSE, a distributed analytics platform for social networking services over mobile devices. MoDisSENSE collects and stores various types of data from heterogeneous sources, such as GPS traces from cell phones, user profile information and comments from social networks connected to the platform. These are combined through spatio-temporal and textual analysis, performed in a distributed fashion, in order to extract knowledge, make smart suggestions and leverage user experience. The datastore follows a hybrid approach to handle both raw and processed data, simultaneously covering the need for scalability and fast query processing. Thus, the platform is able to resolve complex, multi-parameter, socially charged queries over Points of Interest in the order of milliseconds even under heavy load.

Explore More