Peter Chovanec | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Chovanec is active.

Explore More

Publication

Featured researches published by Peter Chovanec.

international conference on digital information management | 2011

Index-based n-gram extraction from large document collections

Michal Kratky; Radim Bača; David Bednar; Jiri Walder; Jiri Dvorsky; Peter Chovanec

N-grams are applied in some applications searching in text documents, especially in cases when one must work with phrases, e.g. in plagiarism detection. N-gram is a sequence of n terms (or generally tokens) from a document. We get a set of n-grams by moving a floating window from the begin to the end of the document. During the extraction we must remove duplicate n-grams and we must store additional values to each n-gram type, e.g. n-gram type frequency for each document and so on, it depends on a query model used. Previous works utilize a sorting algorithm to compute the n-gram frequency. These approaches must handle a high number of the same n-grams resulting in high time and space overhead. Moreover, these techniques are often main-memory only, it means they must be executed for small or middle size collections. In this paper, we show an index-based method to the n-gram extraction for large collections. This method utilizes common data structures like B+-tree and Hash table. We show the scalability of our method by presenting experiments with the gigabytes collection.

database and expert systems applications | 2010

Optimization of disk accesses for multidimensional range queries

Peter Chovanec; Michal Krátký; Radim Bača

Multidimensional data structures have become very popular in recent years. Their importance lies in efficient indexing of data, which have naturally multidimensional characteristics like navigation data, drawing specifications etc. The R-tree is a well-known structure based on the bounding of spatial near points by rectangles. Although efficient query processing of multidimensional data is requested, the R-tree has been shown to be inefficient in many cases. From the disk access cost point of view, the main issue of range query processing is the expensive cost of random accesses during the tree traversal. In the case of queries with low selectivity, the sequential scan of all tuples may be more efficient than the range query processing. We focus on efficiency of the disk access cost and we present an optimization of the disk access cost during range query processing. Our method focuses on a leaf node retrieval and it can be simply adopted by any tree. We put forward our tests using the R-tree since it is the most common multidimensional data structure.

International Conference on Informatics Engineering and Information Science | 2011

Processing of Multidimensional Range Query Using SIMD Instructions

Peter Chovanec; Michal Krátký

Current main stream CPUs provide SIMD (Single Instruction Multiple Data) computational capabilities. Although producers of current hardware provide other computational capabilities like multi-cores CPU, GPU or APU, an important feature of SIMD is that it provides parallel operations for one CPU’s core. In previous works, authors introduced an utilization of the SIMD instructions in some indexing data structures like B-tree. Since multidimensional data structures manage n-dimensional tuples or rectangles, the utilization of these instructions seems to be straightforward in operations manipulating these n-dimensional objects. In this article, we show the utilization of SIMD in the R-tree data structure. Since the range query is one of the most important operation of multidimensional data structures, we suppose the utilization of SIMD in range query processing. Moreover, we show properties and scalability of this solution. We show that the SIMD range query algorithm is up-to 2× faster then the conventional algorithm.

international database engineering and applications symposium | 2013

On the efficiency of multiple range query processing in multidimensional data structures

Peter Chovanec; Michal Krátký

Multidimensional data are commonly utilized in many application areas like electronic shopping, cartography and many others. These data structures support various types of queries, e.g. point or range query. The range query retrieves all tuples of a multidimensional space matched by a query rectangle. Processing range queries in a multidimensional data structure has some performance issues, especially in the case of a higher space dimension or a lower query selectivity. As result, these data are often stored in an array or one-dimensional index like B-tree and range queries are processed with a sequence scan. Many real world queries can be transformed to a multiple range query: the query including more than one query rectangle. In this article, we aim our effort to processing of this type of the range query. First, we show an algorithm processing a sequence of range queries. Second, we introduce a special type of the multiple range query, the Cartesian range query. We show optimality of these algorithms from the IO and CPU costs point of view and we compare their performance with current methods. Although we introduce these algorithms for the R-tree, we show that these algorithms are appropriate for all multidimensional data structures with nested regions.

ADBIS Workshops | 2013

Processing of Range Query Using SIMD and GPU

Pavel Bednář; Petr Gajdoš; Michal Krátký; Peter Chovanec

Onedimensional or multidimensional range query is one of the most important query of physical implementation of DBMS. The number of compared items (of a data structure) can be enormous especially for lower selectivity of the range query. The number of compare operations increases for more complex items (or tuples) with the longer length, e.g. words stored in a B-tree. Due to the possibly high number of compare operations executed during the range query processing, we can take into account hardware devices providing a parallel task computation like CPU’s SIMD or GPU. In this paper, we show the performance and scalability of sequential, index, CPU’s SIMD, and GPU variants of the range query algorithm. These results make possible a future integration of these computation devices into a DBMS kernel.

web age information management | 2011

Multidimensional implementation of stream ADT

Filip Křižka; Michal Krátký; Radim Bača; Peter Chovanec

Holistic approaches are considered as the most robust solution for processing of twig pattern queries requiring no complicated query optimization. Holistic approaches use an abstract data type called a stream which is an ordered set of XML nodes with the same schema node. A straightforward implementation of a stream is a paged array. In this article, we introduce a multidimensional implementation of the stream for path labeling schemes. We also show that this implementation can be extended in such a way that it supports fast searching of nodes with a content. Although many multidimensional data structures have been introduced in recent years, we show that it is necessary to combine two variants of the R-tree (Ordered R-tree and Signature R-tree) for an efficient implementation the stream ADT.

DATESO | 2015