Christos Doulkeridis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christos Doulkeridis is active.

Explore More

Publication

Featured researches published by Christos Doulkeridis.

very large data bases | 2014

A survey of large-scale analytical query processing in MapReduce

Christos Doulkeridis; Kjetil Nørvåg

Enterprises today acquire vast volumes of data from different sources and leverage this information by means of data analysis to support effective decision-making and provide new functionality and services. The key requirement of data analytics is scalability, simply due to the immense volume of data that need to be extracted, processed, and analyzed in a timely fashion. Arguably the most popular framework for contemporary large-scale data analytics is MapReduce, mainly due to its salient features that include scalability, fault-tolerance, ease of programming, and flexibility. However, despite its merits, MapReduce has evident performance limitations in miscellaneous analytical tasks, and this has given rise to a significant body of research that aim at improving its efficiency, while maintaining its desirable properties. This survey aims to review the state of the art in improving the performance of parallel query processing using MapReduce. A set of the most significant weaknesses and limitations of MapReduce is discussed at a high level, along with solving techniques. A taxonomy is presented for categorizing existing research on MapReduce improvements according to the specific problem they target. Based on the proposed taxonomy, a classification of existing research is provided focusing on the optimization objective. Concluding, we outline interesting directions for future parallel data processing systems.

international conference on data engineering | 2007

SKYPEER: Efficient Subspace Skyline Computation over Distributed Data

Akrivi Vlachou; Christos Doulkeridis; Yannis Kotidis; Michalis Vazirgiannis

Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multidimensional dataset. While most previous work has assumed a centralized setting, in this paper we address the efficient computation of subspace skyline queries in large-scale peer-to-peer (P2P) networks, where the dataset is horizontally distributed across the peers. Relying on a super-peer architecture we propose a threshold based algorithm, called SKYPEER, which forwards the skyline query requests among peers, in such a way that the amount of transferred data is significantly reduced. For efficient subspace skyline processing, we extend the notion of domination by defining the extended skyline set, which contains all data elements that are necessary to answer a skyline query in any arbitrary subspace. We prove that our algorithm provides the exact answers and we present optimization techniques to reduce communication cost and execution time. Finally, we provide an extensive experimental evaluation showing that SKYPEER performs efficiently and provides a viable solution when a large degree of distribution is required.

international conference on data engineering | 2010

Reverse top-k queries

Akrivi Vlachou; Christos Doulkeridis; Yannis Kotidis; Kjetil Nørvåg

Rank-aware query processing has become essential for many applications that return to the user only the top-k objects based on the individual users preferences. Top-k queries have been mainly studied from the perspective of the user, focusing primarily on efficient query processing. In this work, for the first time, we study top-k queries from the perspective of the product manufacturer. Given a potential product, which are the user preferences for which this product is in the top-k query result set? We identify a novel query type, namely reverse top-k query, that is essential for manufacturers to assess the potential market and impact of their products based on the competition. We formally define reverse top-k queries and introduce two versions of the query, namely monochromatic and bichromatic. We first provide a geometric interpretation of the monochromatic reverse top-k query in the solution space that helps to understand the reverse top-k query conceptually. Then, we study in more details the case of bichromatic reverse top-k query, which is more interesting for practical applications. Such a query, if computed in a straightforward manner, requires evaluating a top-k query for each user preference in the database, which is prohibitively expensive even for moderate datasets. In this paper, we present an efficient threshold-based algorithm that eliminates candidate user preferences, without processing the respective top-k queries. Furthermore, we introduce an indexing structure based on materialized reverse top-k views in order to speed up the computation of reverse top-k queries. Materialized reverse top-k views trade preprocessing cost for query speed up in a controllable manner. Our experimental evaluation demonstrates the efficiency of our techniques, which reduce the required number of top-k computations by 1 to 3 orders of magnitude.

international conference on management of data | 2008

Angle-based space partitioning for efficient parallel skyline computation

Akrivi Vlachou; Christos Doulkeridis; Yannis Kotidis

Recently, skyline queries have attracted much attention in the database research community. Space partitioning techniques, such as recursive division of the data space, have been used for skyline query processing in centralized, parallel and distributed settings. Unfortunately, such grid-based partitioning is not suitable in the case of a parallel skyline query, where allpartitions are examined at the same time, since many data partitions do not contribute to the overall skyline set, resulting in a lot of redundant processing. In this paper we propose a novel angle-based space partitioning scheme using the hyperspherical coordinates of the data points. We demonstrate both formally as well as through an exhaustive set of experiments that this new scheme is very suitable for skyline query processing in a parallel share-nothing architecture. The intuition of our partitioning technique is that the skyline points are equally spread to all partitions. We also show that partitioning the data according to the hyperspherical coordinates manages to increase the average pruning power of points within a partition. Our novel partitioning scheme alleviates most of the problems of traditional grid partitioning techniques, thus managing to reduce the response time and share the computational workload more fairly. As demonstrated by our experimental study, our technique outperforms grid partitioning in all cases, thus becoming an efficient and scalable solution for skyline query processing in parallel environments.

international conference on management of data | 2008

On efficient top-k query processing in highly distributed environments

Akrivi Vlachou; Christos Doulkeridis; Kjetil Nørvåg; Michalis Vazirgiannis

Lately the advances in centralized database management systems show a trend towards supporting rank-aware query operators, like top-k, that enable users to retrieve only the most interesting data objects. A challenging problem is to support rank-aware queries in highly distributed environments. In this paper, we present a novel approach, called SPEERTO, for top-k query processing in large-scale peer-to-peer networks, where the dataset is horizontally distributed over the peers. Towards this goal, we explore the applicability of the skyline operator for efficiently routing top-k queries in a large super-peer network. Relying on a thresholding scheme, SPEERTO returns the exact results progressively to the user, while the number of queried super-peers and transferred data is minimized. Finally, we propose different variations of SPEERTO that allow balancing between transferred data volume and response time. Through simulations we demonstrate the feasibility of our approach.

Electronic Notes in Theoretical Computer Science | 2006

A System Architecture for Context-Aware Service Discovery

Christos Doulkeridis; Nikos Loutas; Michalis Vazirgiannis

Recent technological advances have enabled both the consumption and provision of mobile services (m-services) by small, portable, handheld devices. However, mobile devices still have restricted capabilities with respect to processing, storage space, energy consumption, stable connectivity, bandwidth availability. In order to address these shortcomings, a potential solution is context-awareness (by context we refer to the implicit information related both to the requesting user and service provider that can affect the usefulness of the returned results). Context plays the role of a filtering mechanism, allowing only transmission of relevant data and services back to the device, thus saving bandwidth and reducing processing costs. In this paper, we present an architecture for context-aware service discovery. We describe in detail the system implementation and we present the system evaluation as a tradeoff between a) the increase of the quality of service discovery when context-awareness is taken into account and b) the extra cost/burden imposed by context management.

IEEE Journal on Selected Areas in Communications | 2007

DESENT: decentralized and distributed semantic overlay generation in P2P networks

Christos Doulkeridis; Kjetil Nørvåg; Michalis Vazirgiannis

The current approach in web searching, i.e., using centralized search engines, rises issues that question their future applicability: 1) coverage and scalability, 2) freshness, and 3) information monopoly. Performing web search using a P2P architecture that consists of the actual web servers has the potential to tackle those issues. In order to achieve the desired performance and scalability, as well as enhancing search quality relative to centralized search engines, semantic overlay networks (SONS) connecting peers storing semantically related information can be employed. The lack of global content/topology knowledge in a P2P system is the key challenge in forming SONS, and this paper describes an unsupervised approach for decentralized and distributed generation of SONS (DESENT). Through simulations and analytical cost models we verify our claims regarding performance, scalability, and quality.

very large data bases | 2010

Identifying the most influential data objects with reverse top-k queries

Akrivi Vlachou; Christos Doulkeridis; Kjetil Nørvåg; Yannis Kotidis

Top-k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-k queries leads to a query type that instead returns the set of customers that find a product appealing (it belongs to the top-k result set of their preferences). In this paper, we address the challenging problem of processing queries that identify the top-m most influential products to customers, where influence is defined as the cardinality of the reverse top-k result set. This definition of influence is useful for market analysis, since it is directly related to the number of customers that value a particular product and, consequently, to its visibility and impact in the market. Existing techniques require processing a reverse top-k query for each object in the database, which is prohibitively expensive even for databases of moderate size. In contrast, we propose two algorithms, SB and BB, for identifying the most influential objects: SB restricts the candidate set of objects that need to be examined, while BB is a branch-and-bound algorithm that retrieves the result incrementally. Furthermore, we propose meaningful variations of the query for most influential objects that are supported by our algorithms. Our experiments demonstrate the efficiency of our algorithms both for synthetic and real-life datasets.

international conference on management of data | 2013

Branch-and-bound algorithm for reverse top-k queries

Akrivi Vlachou; Christos Doulkeridis; Kjetil Nørvåg; Yannis Kotidis

Top-k queries return to the user only the k best objects based on the individual user preferences and comprise an essential tool for rank-aware query processing. Assuming a stored data set of user preferences, reverse top-k queries have been introduced for retrieving the users that deem a given database object as one of their top-k results. Reverse top-k queries have already attracted significant interest in research, due to numerous real-life applications such as market analysis and product placement. Currently, the most efficient algorithm for computing the reverse top-k set is RTA. RTA has two main drawbacks when processing a reverse top-k query: (i) it needs to access all stored user preferences, and (ii) it cannot avoid executing a top-k query for each user preference that belongs to the result set. To address these limitations, in this paper, we identify useful properties for processing reverse top-k queries without accessing each users individual preferences nor executing the top-k query. We propose an intuitive branch-and-bound algorithm for processing reverse top-k queries efficiently and discuss novel optimizations to boost its performance. Our experimental evaluation demonstrates the efficiency of the proposed algorithm that outperforms RTA by a large margin.

very large data bases | 2010

Efficient processing of top-k spatial preference queries

João B. Rocha-Junior; Akrivi Vlachou; Christos Doulkeridis; Kjetil Nørvåg

Top-k spatial preference queries return a ranked set of the k best data objects based on the scores of feature objects in their spatial neighborhood. Despite the wide range of location-based applications that rely on spatial preference queries, existing algorithms incur non-negligible processing cost resulting in high response time. The reason is that computing the score of a data object requires examining its spatial neighborhood to find the feature object with highest score. In this paper, we propose a novel technique to speed up the performance of top-k spatial preference queries. To this end, we propose a mapping of pairs of data and feature objects to a distance-score space, which in turn allows us to identify and materialize the minimal subset of pairs that is sufficient to answer any spatial preference query. Furthermore, we present a novel algorithm that improves query processing performance by avoiding examining the spatial neighborhood of the data objects during query execution. In addition, we propose an efficient algorithm for materialization and we describe useful properties that reduce the cost of maintenance. We show through extensive experiments that our approach significantly reduces the number of I/Os and execution time compared to the state-of-the-art algorithms for different setups.

Explore More