Dimitrios Tsoumakos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dimitrios Tsoumakos is active.

Explore More

Publication

Featured researches published by Dimitrios Tsoumakos.

international conference on peer-to-peer computing | 2003

Adaptive probabilistic search for peer-to-peer networks

Dimitrios Tsoumakos; Nick Roussopoulos

Peer-to-peer networks are gaining increasing attention from both the scientific and the large Internet user community. Popular applications utilizing this new technology offer many attractive features to a growing number of users. At the heart of such networks lies the search algorithm. Proposed methods either depend on the network-disastrous flooding and its variations or utilize various indices too expensive to maintain. We describe an adaptive, bandwidth-efficient algorithm for search in unstructured peer-to-peer networks, the adaptive probabilistic search method (APS). Our scheme utilizes feedback from previous searches to probabilistically guide future ones. It performs efficient object discovery while inducing zero overhead over dynamic network operations. Extensive simulation results show that APS achieves high success rates, increased number of discovered objects, very low bandwidth consumption and adaptation to changing topologies.

international world wide web conferences | 2012

H2RDF: adaptive query processing on RDF data in the cloud.

Nikolaos Papailiou; Ioannis Konstantinou; Dimitrios Tsoumakos; Nectarios Koziris

In this work we present H2RDF, a fully distributed RDF store that combines the MapReduce processing framework with a NoSQL distributed data store. Our system features two unique characteristics that enable efficient processing of both simple and multi-join SPARQL queries on virtually unlimited number of triples: Join algorithms that execute joins according to query selectivity to reduce processing; and adaptive choice among centralized and distributed (MapReduce-based) join execution for fast query responses. Our system efficiently answers both simple joins and complex multivariate queries and easily scales to 3 billion triples using a small cluster of 9 worker nodes. H2RDF outperforms state-of-the-art distributed solutions in multi-join and nonselective queries while achieving comparable performance to centralized solutions in selective queries. In this demonstration we showcase the systems functionality through an interactive GUI. Users will be able to execute predefined or custom-made SPARQL queries on datasets of different sizes, using different join algorithms. Moreover, they can repeat all queries utilizing a different number of cluster resources. Using real-time cluster monitoring and detailed statistics, participants will be able to understand the advantages of different execution schemes versus the input data as well as the scalability properties of H2RDF over both the data size and the available worker resources.

scalable information systems | 2006

Analysis and comparison of P2P search methods

Dimitrios Tsoumakos; Nick Roussopoulos

The popularity attributed to current Peer-to-Peer applications makes the operation of these distributed systems very important for the Internet community. Efficient object discovery is the first step towards the realization of distributed resource-sharing. In this work, we present a detailed overview of existing search methods for unstructured Peer-to-Peer networks. We analyze the performance of the algorithms relative to various metrics, giving emphasis on the success rate, bandwidth-efficiency and adaptation to dynamic network conditions. Simulation results are used to empirically evaluate the behavior of nine representative schemes under a variety of different environments.

conference on information and knowledge management | 2011

On the elasticity of NoSQL databases over cloud management platforms

Ioannis Konstantinou; Evangelos Angelou; Christina Boumpouka; Dimitrios Tsoumakos; Nectarios Koziris

NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. One of their strongest features is elasticity, which allows for fairly portioned premiums and high-quality performance and directly applies to the philosophy of a cloud-based platform. Yet, the process of adaptive expansion and contraction of resources usually involves a lot of manual effort during cluster configuration. To date, there exists no comparative study to quantify this cost and measure the efficacy of NoSQL engines that offer this feature over a cloud provider. In this work, we present a cloud-enabled framework for adaptive monitoring of NoSQL systems. We perform a study of the elasticity feature on some of the most popular NoSQL databases over an open-source cloud platform. Based on these measurements, we finally present a prototype implementation of a decision making system that enables automatic elastic operations of any NoSQL engine based on administrator or application-specified constraints.

ieee/acm international symposium cluster, cloud and grid computing | 2013

Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA

Dimitrios Tsoumakos; Ioannis Konstantinou; Christina Boumpouka; Spyros Sioutas; Nectarios Koziris

This work presents TIRAMOLA, a cloud-enabled, open-source framework to perform automatic resizing of NoSQL clusters according to user-defined policies. Decisions on adding or removing worker VMs from a cluster are modeled as a Markov Decision Process and taken in real-time. The system automatically decides on the most advantageous cluster size according to user-defined policies, it then proceeds on requesting/releasing VM resources from the provider and orchestrating them inside a NoSQL cluster. TIRAMOLAs modular architecture and standard API support allows interaction with most current IaaS platforms and increased customization. An extensive experimental evaluation on an HBase cluster confirms our assertions: The system resizes clusters in real-time and adapts its performance through different optimization strategies, different permissible actions, different input and training loads. Besides the automation of the process, it exhibits a learning feature which allows it to make very close to optimal decisions even with input loads 130% larger or alternating 10 times faster compared to the accumulated information.

international conference on big data | 2013

H 2 RDF+: High-performance distributed joins over large-scale RDF graphs

Nikolaos Papailiou; Ioannis Konstantinou; Dimitrios Tsoumakos; Panagiotis Karras; Nectarios Koziris

The proliferation of data in RDF format calls for efficient and scalable solutions for their management. While scalability in the era of big data is a hard requirement, modern systems fail to adapt based on the complexity of the query. Current approaches do not scale well when faced with substantially complex, non-selective joins, resulting in exponential growth of execution times. In this work we present H2RDF+, an RDF store that efficiently performs distributed Merge and Sort-Merge joins over a multiple index scheme. H2RDF+ is highly scalable, utilizing distributed MapReduce processing and HBase indexes. Utilizing aggressive byte-level compression and result grouping over fast scans, it can process both complex and selective join queries in a highly efficient manner. Furthermore, it adaptively chooses for either single- or multi-machine execution based on join complexity estimated through index statistics. Our extensive evaluation demonstrates that H2RDF+ efficiently answers non-selective joins an order of magnitude faster than both current state-of-the-art distributed and centralized stores, while being only tenths of a second slower in simple queries, scaling linearly to the amount of available resources.

Information Systems | 2009

GrouPeer: Dynamic clustering of P2P databases

Verena Kantere; Dimitrios Tsoumakos; Timos K. Sellis; Nick Roussopoulos

Sharing structured data in a P2P network is a challenging problem, especially in the absence of a mediated schema. The standard practice of answering a consecutively rewritten query along the propagation path often results in significant loss of information. On the opposite, the use of mediated schemas requires human interaction and global agreement, both during creation and maintenance. In this paper we present GrouPeer, an adaptive, automated approach to both issues in the context of unstructured P2P database overlays. By allowing peers to individually choose which rewritten version of a query to answer and evaluate the received answers, information-rich sources left hidden otherwise are discovered. Gradually, the overlay is restructured as semantically similar peers are clustered together. Experimental results show that our technique produces very accurate answers and builds clusters that are very close to the optimal ones by contacting a very small number of nodes in the overlay.

IEEE Transactions on Parallel and Distributed Systems | 2011

Fast and Cost-Effective Online Load-Balancing in Distributed Range-Queriable Systems

Ioannis Konstantinou; Dimitrios Tsoumakos; Nectarios Koziris

Distributed systems such as Peer-to-Peer overlays have been shown to efficiently support the processing of range queries over large numbers of participating hosts. In such systems, uneven load allocation has to be effectively tackled in order to minimize overloaded peers and optimize their performance. In this work, we detect the two basic methodologies used to achieve load-balancing: Iterative key redistribution between neighbors and node migration. We identify these two key mechanisms and describe their relative advantages and disadvantages. Based on this analysis, we propose NIXMIG, a hybrid method that adaptively utilizes these two extremes to achieve both fast and cost-effective load-balancing in distributed systems that support range queries. We theoretically prove its convergence and as a case study, we offer an implementation on top of a Skip Graph, where we thoroughly validate our findings in a variety of static, dynamic and realistic workloads. We compare NIXMIG with an existing load-balancing algorithm proposed by Karger and Ruhl [1] and our experimental analysis shows that, NIXMIG can be as much as three times faster, requiring only one sixth and one third of message and item exchanges, respectively, to bring the system to a balanced state.

international conference on management of data | 2012

TIRAMOLA: elastic nosql provisioning through a cloud management platform

Ioannis Konstantinou; Evangelos Angelou; Dimitrios Tsoumakos; Christina Boumpouka; Nectarios Koziris; Spyros Sioutas

NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. One of their strongest features is elasticity, which allows for fairly portioned premiums and high-quality performance. Yet, the process of adaptive expansion and contraction of resources usually involves a lot of manual effort, often requiring the definition of the conditions for scaling up or down to be provided by the users. To date, there exists no open-source system for automatic resizing of NoSQL clusters. In this demonstration, we present TIRAMOLA, a modular, cloud-enabled framework for monitoring and adaptively resizing NoSQL clusters. Our system incorporates a decision-making module which allows for optimal cluster resize actions in order to maximize any quantifiable reward function provided together with life-long adaptation to workload or infrastructural changes. The audience will be able to initiate HBase clusters of various sizes and apply varying workloads through multiple YCSB clients. The attendees will be able to watch, in real-time, the system perform automatic VM additions and removals as well as how cluster performance metrics change relative to the optimization parameters of their choice.

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud | 2010

Distributed indexing of web scale datasets for the cloud

Ioannis Konstantinou; Evangelos Angelou; Dimitrios Tsoumakos; Nectarios Koziris

In this paper, we present a distributed architecture for indexing and serving large and diverse datasets. It incorporates and extends the functionality of Hadoop, the open source MapReduce framework, and of HBase, a distributed, sparse, NoSQL database, to create a fully parallel indexing system. Experiments with structured, semi-structured and unstructured data of various sizes demonstrate the flexibility, speed and robustness of our implementation and contrast it with similarly oriented projects. Our 11 node cluster prototype managed to keep full-text indexing time of 150GB raw content in less than 3 hours, whereas the systems response time under sustained query load of more than 1000 queries/sec was kept in the order of milliseconds.

Explore More