Lefteris Sidirourgos
Centrum Wiskunde & Informatica
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lefteris Sidirourgos.
extending database technology | 2012
Petros Tsialiamanis; Lefteris Sidirourgos; Irini Fundulaki; Vassilis Christophides; Peter A. Boncz
Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-of-the-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks.
international conference on management of data | 2010
Sándor Héman; Marcin Zukowski; Niels Nes; Lefteris Sidirourgos; Peter A. Boncz
In this paper we investigate techniques that allow for on-line updates to columnar databases, leaving intact their high read-only performance. Rather than keeping differential structures organized by the table key values, the core proposition of this paper is that this can better be done by keeping track of the tuple position of the modifications. Not only does this minimize the computational overhead of merging in differences into read-only queries, but this makes the differential structure oblivious of the value of the order keys, allowing it to avoid disk I/O for retrieving the order keys in read-only queries that otherwise do not need them - a crucial advantage for a column-store. We describe a new data structure for maintaining such positional updates, called the Positional Delta Tree (PDT), and describe detailed algorithms for PDT/column merging, updating PDTs, and for using PDTs in transaction management. In experiments with a columnar DBMS, we perform microbenchmarks on PDTs, and show in a TPC-H workload that PDTs allow quick on-line updates, yet significantly reduce their performance impact on read-only queries compared with classical value-based differential methods.
Distributed and Parallel Databases | 2008
Lefteris Sidirourgos; Giorgos Kokkinidis; Theodore Dalamagas; Vassilis Christophides; Timos K. Sellis
Abstract P2P computing gains increasing attention lately, since it provides the means for realizing computing systems that scale to very large numbers of participating peers, while ensuring high autonomy and fault-tolerance. Peer Data Management Systems (PDMS) have been proposed to support sophisticated facilities in exchanging, querying and integrating (semi-)structured data hosted by peers. In this paper, we are interested in routing graph queries in a very large PDMS, where peers advertise their local bases using fragments of community RDF/S schemes (i.e., views). We introduce an original encoding for these fragments, in order to efficiently check whether a peer view is subsumed by a query. We rely on this encoding to design an RDF/S view lookup service featuring a statefull and a stateless execution over a DHT-based P2P infrastructure. We finally evaluate experimentally our system to demonstrate its scalability for very large P2P networks and arbitrary RDF/S schema fragments, and to estimate the number of routing hops required by the two versions of our lookup service.
conference on information and knowledge management | 2009
Nan Tang; Lefteris Sidirourgos; Peter A. Boncz
Exact substring matching queries on large data collections can be answered using q-gram indices, that store for each occurring q-byte pattern an (ordered) posting list with the positions of all occurrences. Such gram indices are known to provide fast query response time and to allow the index to be created quickly even on huge disk-based datasets. Their main drawback is relatively large storage space, that is a constant multiple (typically >2) of the original data size, even when compression is used. In this work, we study methods to conserve the scalable creation time and efficient exact substring query properties of gram indices, while reducing storage space. To this end, we first propose a partial gram index based on a reduction from the problem of omitting indexed q-grams to the set cover problem. While this method is successful in reducing the size of the index, it generates false positives at query time, reducing efficiency. We then increase the accuracy of partial grams by splitting posting lists of frequent grams in a frequency-tuned set of signatures that take the bytes surrounding the grams into account. The resulting qs-gram scheme is tested on huge collections (up to 426GB) and is shown to achieve an almost 1:1 data:index size, and query performance even faster than normal gram methods, thanks to the reduced size and access cost.
international conference on big data | 2013
Lefteris Sidirourgos; Martin L. Kersten; Peter A. Boncz
Scientific discovery has shifted from being an exercise of theory and computation, to become the exploration of an ocean of observational data. Scientists explore data originated from modern scientific instruments in order to discover interesting aspects of it and formulate their hypothesis. Such workloads press for new database functionality. We aim at sampling scientific databases to create many different impressions of the data, on which the scientists can quickly evaluate exploratory queries. However, scientific databases introduce different challenges for sample construction compared to classical business analytical applications. We propose adaptive weighted sampling as an alternative to uniform sampling. With weighted sampling only the most informative data is being sampled, thus more relevant data to the scientific discovery is available to examine a hypothesis. Relevant data is considered to be the focal points of the scientific search, and can be defined either a priori with the use of functions, or by monitoring the query workload. We study such query workloads, and we detail different families of weight functions. Finally, we give a quantitative and qualitative evaluation of weighted sampling.
edbt icdt workshops | 2009
Lefteris Sidirourgos; Peter A. Boncz
We describe a collection of indices for XML text, element, and attribute node values that (i) consume little storage, (ii) have low maintenance overhead, (iii) permit fast equi-lookup on string values, and (iv) support range-lookup on any XML typed value (e.g., double, dateTime). The equi-lookup string value index depends on an elaborate hash function and on an associative combination function to facilitate updates on both mixed-content and element nodes. We also present techniques for creating range-lookup indices supporting any ordered XML typed value. These indices rely on a finite state machine that accepts the type specific language, and on a state combination table for combining states to speed-up updates. We evaluate the stability of the hash function, the storage overhead, and the indices creation and maintenance time in the context of the open-source XML database system MonetDB/XQuery.
data management on new hardware | 2017
Lefteris Sidirourgos; Hannes Mühleisen
Column Imprints is a pre-filtering secondary index for answering range queries. The main feature of imprints is that they are light-weight and are based on compressed bit-vectors, one per cacheline, that quickly determine if the values in that cacheline satisfy the predicates of a query. The main overhead of the imprints implementation is the many sequential value comparisons against the boundaries of a virtual equi-height histogram. Similarly, during query scans, many sequential value comparisons are performed to identify false positives. In this paper, we speed-up the process of imprints creation and querying by using advanced vectorization techniques. We also experimentally explore the benefits of stretching imprints to larger bit-vector sizes and blocks of data, using 256-bit SIMD registers. Our findings are very promising for both imprints and for future index design research that would employ advanced vectorization techniques and larger (up to 512-bit) and more (from 16 now to 32) SIMD registers.
very large data bases | 2008
Lefteris Sidirourgos; Romulo Goncalves; Martin L. Kersten; Niels Nes; Stefan Manegold
international conference on management of data | 2013
Lefteris Sidirourgos; Martin L. Kersten
conference on innovative data systems research | 2017
Martin L. Kersten; Lefteris Sidirourgos