Is this you? Create Your Porfile

Radim Bača

Technical University of Ostrava

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Radim Bača is active.

Explore More

Publication

Featured researches published by Radim Bača.

Information Sciences | 2012

Fast decoding algorithms for variable-lengths codes

Jiří Walder; Michal Krátký; Radim Bača; Jan Platos; Václav Snášel

Data compression has been widely applied in many data processing areas. Compression methods use variable-length codes with the shorter codes assigned to symbols or groups of symbols that appear in the data frequently. There exist many coding algorithms, e.g. Elias-delta codes, Fibonacci codes and other variable-length codes which are often applied to encoding of numbers. Although we often do not consider time consumption of decompression as well as compression algorithms, there are cases where the decompression time is a critical issue. For example, a real-time compression of data structures, applied in the case of the physical implementation of database management systems, follows this issue. In this case, pages of a data structure are decompressed during every reading from a secondary storage into the main memory or items of a page are decompressed during every access to the page. Obviously, efficiency of a decompression algorithm is extremely important. Since fast decoding algorithms were not known until recently, variable-length codes have not been used in the data processing area. In this article, we introduce fast decoding algorithms for Elias-delta, Fibonacci of order 2 as well as Fibonacci of order 3 codes. We provide a theoretical background making these fast algorithms possible. Moreover, we introduce a new code, called the Elias-Fibonacci code, with a lower compression ratio than the Fibonacci of order 3 code for lower numbers; however, this new code provides a faster decoding time than other tested codes. Codes of Elias-Fibonacci are shorter than other compared codes for numbers longer than 26 bits. All these algorithms are suitable in the case of data processing tasks with special emphasis on the decompression time.

international database engineering and applications symposium | 2008

On the efficient search of an XML twig query in large DataGuide trees

Radim Bača; Michal Krátký; Václav Snášel

XML (Extensible Mark-up Language) has been embraced as a new approach to data modeling. Nowadays, more and more information is formatted as semi-structured data, e.g., articles in a digital library, documents on the web, and so on. Implementation of an efficient system enabling storage and querying of XML documents requires development of new techniques. Many different techniques of XML indexing have been proposed in recent years. In the case of XML data, we can distinguish the following trees: an XML tree, a tree of elements and attributes, and a DataGuide, a tree of element tags and attribute names. Obviously, the XML tree of an XML document is much larger than the DataGuide of a given document. Authors often consider DataGuide as a small tree. Therefore, they consider the DataGuide search as a small problem. However, we show that DataGuide trees are often massive in the case of real XML documents. Consequently, a trivial DataGuide search may be time and memory consuming. In this article, we introduce efficient methods for searching an XML twig pattern in large, complex DataGuide trees.

very large data bases | 2013

Optimal and efficient generalized twig pattern processing: a combination of preorder and postorder filterings

Radim Bača; Michal Krátký; Tok Wang Ling; Jiaheng Lu

Searching for occurrences of a twig pattern query (TPQ) in an XML document is a core task of all XML database query languages. The generalized twig pattern (GTP) extends the TPQ model to include semantics related to output nodes, optional nodes, and boolean expressions which are part of the XQuery language. Preorder filtering holistic algorithms such as TwigStack represent a significant class of TPQ processing approaches with a linear worst-case I/O complexity with respect to the sum of the input and output sizes for some query classes. Another important class of holistic approaches is represented by postorder filtering holistic algorithms such as

international conference on digital information management | 2011

Index-based n-gram extraction from large document collections

Michal Kratky; Radim Bača; David Bednar; Jiri Walder; Jiri Dvorsky; Peter Chovanec

database systems for advanced applications | 2010

Benchmarking the compression of XML node streams

Radim Bača; Jiří Walder; Martin Pawlas; Michal Krátký

\text{ Twig}^2

international xml database symposium | 2009

On the Efficiency of a Prefix Path Holistic Algorithm

Radim Bača; Michal Krátký

database systems for advanced applications | 2009

TJDewey --- On the Efficient Path Labeling Scheme Holistic Approach

Radim Bača; Michal Krátký

Stack which introduced a linear output enumeration time with respect to the result size. In this article, we introduce a holistic algorithm called GTPStack which is the first approach capable of processing a GTP with a linear worst-case I/O complexity with respect to the GTP result size. This is achieved by using a combination of the preorder and postorder filterings before storing nodes in an intermediate storage. Additionally, another contribution of this article is an introduction of a new perspective of holistic algorithm optimality. We show that the optimality depends not only on a query class but also on XML document characteristics. This new view on the optimality extends the general knowledge about the type of queries for which the holistic algorithms are optimal. Moreover, it allows us to determine that GTPStack is optimal for any GTP when a specific XML document is considered. We present a comprehensive experimental study of the state-of-the-art holistic algorithms showing under which conditions GTPStack outperforms the other holistic approaches.

database and expert systems applications | 2010

Optimization of disk accesses for multidimensional range queries

Peter Chovanec; Michal Krátký; Radim Bača

N-grams are applied in some applications searching in text documents, especially in cases when one must work with phrases, e.g. in plagiarism detection. N-gram is a sequence of n terms (or generally tokens) from a document. We get a set of n-grams by moving a floating window from the begin to the end of the document. During the extraction we must remove duplicate n-grams and we must store additional values to each n-gram type, e.g. n-gram type frequency for each document and so on, it depends on a query model used. Previous works utilize a sorting algorithm to compute the n-gram frequency. These approaches must handle a high number of the same n-grams resulting in high time and space overhead. Moreover, these techniques are often main-memory only, it means they must be executed for small or middle size collections. In this paper, we show an index-based method to the n-gram extraction for large collections. This method utilizes common data structures like B+-tree and Hash table. We show the scalability of our method by presenting experiments with the gigabytes collection.

Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web | 2008

A cost-based join selection for XML twig content-based queries

Radim Bača; Michal Krátký

In recent years, many approaches to XML twig pattern query processing have been developed. Holistic approaches are particularly significant in that they provide a theoretical model for optimal processing of some query classes and have very low main memory complexity. Holistic algorithms are supported by a stream abstract data type. This data type is usually implemented using inverted lists or special purpose data structures. In this article, we focus on an efficient implementation of a stream ADT. We utilize previously proposed fast decoding algorithms for some prefix variable-length codes, like Elias-delta, Fibonacci of order 2 and 3 as well as Elias-Fibonacci codes. We compare the efficiency of the access to a stream using various decompression algorithms. These results are compared with the result of data structures where no compression is used. We show that the compression improves the efficiency of XML query processing.

Information Systems | 2015

Cost-based holistic twig joins

Radim Bača; Petr Lukas; Michal Krátký

In recent years, many approaches to XML twig pattern searching have been developed. Holistic approaches such as TwigStack are particularly significant in that they provide a powerful theoretical model for optimal processing of some query types. Holistic algorithms use various partitionings of an XML document called streaming schemes and they prove algorithm optimality depending on query characteristics. In this article, we introduce a variant of the TwigStack algorithm which can work with various streaming schemes. Its efficiency does not deteriorate when the number of streams per query node is increased, as it does in the case of the iTwigJoin algorithm. Since the indices utilized by the iTwigJoin and our algorithm are exactly the same, we can use heuristics to select the appropriate algorithm. The aim of this paper is to show that the prefix path streaming scheme algorithms can be efficient even for documents with many labeled paths.

Explore More