Pavel Zezula | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pavel Zezula is active.

Explore More

Publication

Featured researches published by Pavel Zezula.

Multimedia Tools and Applications | 2003

D-Index: Distance Searching Index for Metric Data Sets

Vlastislav Dohnal; Claudio Gennaro; Pasquale Savino; Pavel Zezula

In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types grows, metric spaces have become a popular paradigm for similarity retrieval. We propose a new index structure, called D-Index, that combines a novel clustering technique and the pivot-based distance searching strategy to speed up execution of similarity range and nearest neighbor queries for large files with objects stored in disk memories. We have qualitatively analyzed D-Index and verified its properties on actual implementation. We have also compared D-Index with other index structures and demonstrated its superiority on several real-life data sets. Contrary to tree organizations, the D-Index structure is suitable for dynamic environments with a high rate of delete/insert operations.

scalable information systems | 2006

M-Chord: a scalable distributed similarity search structure

David Novak; Pavel Zezula

The need for a retrieval based not on the attribute values but on the very data content has recently led to rise of the metric-based similarity search. The computational complexity of such a retrieval and large volumes of processed data call for distributed processing which allows to achieve scalability. In this paper, we propose M-Chord, a distributed data structure for metric-based similarity search. The structure takes advantage of the idea of a vector index method iDistance in order to transform the issue of similarity searching into the problem of interval search in one dimension. The proposed peer-to-peer organization, based on the Chord protocol, distributes the storage space and parallelizes the execution of similarity queries. Promising features of the structure are validated by experiments on the prototype implementation and two real-life datasets.

symposium on principles of database systems | 1998

A cost model for similarity queries in metric spaces

Paolo Ciaccia; Marco Patella; Pavel Zezula

We consider the problem of estimating CPU (distance computations) and I/O costs for processing range and k-nearest neighbors queries over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performance of multi-dimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite different and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the M-tree access method [10]. Our results rely on the assumption that the indexed dataset comes from a metric space which is “homogeneous” enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a specific query object is unknown. We experimentally validate the model over both real and synthetic datasets, and show how the model can be used to tune the M-tree in order to minimize a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the vp-tree index structure [8].

very large data bases | 1998

Approximate Similarity Retrieval with M-trees

Pavel Zezula; Pasquale Savino; Giuseppe Amato; Fausto Rabitti

Abstract. Motivated by the urgent need to improve the efficiency of similarity queries, approximate similarity retrieval is investigated in the environment of a metric tree index called the M-tree. Three different approximation techniques are proposed, which show how to forsake query precision for improved performance. Measures are defined that can quantify the improvements in performance efficiency and the quality of approximations. The proposed approximation techniques are then tested on various synthetic and real-life files. The evidence obtained from the experiments confirms our hypothesis that a high-quality approximated similarity search can be performed at a much lower cost than that needed to obtain the exact results. The proposed approximation techniques are scalable and appear to be independent of the metric used. Extensions of these techniques to the environments of other similarity search indexes are also discussed.

ACM Transactions on Information Systems | 1991

Dynamic partitioning of signature files

Pavel Zezula; Fausto Rabitti; Paolo Tiberio

The signature file access method has proved to be a convenient indexing technique, in particular for text data Because it can deal with unformatted data, many application domains have shown interest in signature file techniques, e.g., office information systems, statistical and logic databases. We argue that multimedia databases should also take advantage of this method, provided convenient storage structures for organizing signature tiles are available, Our main concern here is the dynamic organization of signatures based on a partitioning paradigm called Quick Filter. A signature file is partitioned by a hashing function and the partitions are orgamzed by linear hashing, Thorough performance evaluation of the new scheme is provided, and it is compared with single-level and multdevel storage structures Results show that quick filter is economical in space and very convenient for applications dealing with large files of dynamic data, and where user queries result in signatures with high weights. These characteristics are particularly interesting for multimedia databases, where integrated access to attributes, text and images must be provided.

Information Systems | 2011

Metric Index: An efficient and scalable solution for precise and approximate similarity search

David Novak; Michal Batko; Pavel Zezula

Metric space is a universal and versatile model of similarity that can be applied in various areas of information retrieval. However, a general, efficient, and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index) that employs practically all known principles of metric space partitioning, pruning, and filtering, thus reaching high search performance while having constant building costs per object. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in established structures such as the B^+-tree or even in a distributed storage. We implemented the M-Index with the B^+-tree and performed experiments on two datasets-the first is an artificial set of vectors and the other is a real-life dataset composed of a combination of five MPEG-7 visual descriptors extracted from a database of up to several million digital images. The experiments put several M-Index variants under test and compare them with established techniques for both precise and approximate similarity search. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Further, the M-Index demonstrates excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient-maintaining practically constant response times while preserving a very high recall as the dataset grows and even beating approaches designed purely for approximate search.

acm international conference on digital libraries | 2007

MESSIF: metric similarity search implementation framework

Michal Batko; David Novak; Pavel Zezula

The similarity search has become a fundamental computational task in many applications. One of the mathematical models of the similarity - the metric space - has drawn attention of many researchers resulting in several sophisticated metric-indexing techniques. An important part of a research in this area is typically a prototype implementation and subsequent experimental evaluation of the proposed data structure. This paper describes an implementation framework called MESSIF that eases the task of building such prototypes. It provides a number of modules from basic storage management, over a wide support for distributed processing, to automatic collecting of performance statistics. Due to its open and modular design it is also easy to implement additional modules, if necessary. The MESSIF also offers several ready to use generic clients that allow to control and test the index structures.

scalable information systems | 2006

On scalability of the similarity search in the world of peers

Michal Batko; David Novak; Fabrizio Falchi; Pavel Zezula

Due to the increasing complexity of current digital data, similarity search has become a fundamental computational task in many applications. Unfortunately, its costs are still high and the linear scalability of single server implementations prevents from efficient searching in large data volumes. In this paper, we shortly describe four recent scalable distributed similarity search techniques and study their performance of executing queries on three different datasets. Though all the methods employ parallelism to speed up query execution, different advantages for different objectives have been identified by experiments. The reported results can be exploited for choosing the best implementations for specific applications. They can also be used for designing new and better indexing structures in the future.

extending database technology | 1998

Processing Complex Similarity Queries with Distance-Based Access Methods

Paolo Ciaccia; Marco Patella; Pavel Zezula

Efficient evaluation of similarity queries is one of the basic requirements for advanced multimedia applications. In this paper, we consider the relevant case where complex similarity queries are defined through a generic language £ and whose predicates refer to a single feature F. Contrary to the language level which deals only with similarity scores, the proposed evaluation process is based on distances between feature values - known spatial or metric indexes use distances to evaluate predicates. The proposed solution suggests that the index should process complex queries as a whole, thus evaluating multiple similarity predicates at a time. The flexibility of our approach is demonstrated by considering three different similarity languages, and showing how the M-tree access method has been extended to this purpose. Experimental results clearly show that performance of the extended M-tree is consistently better than that of state-of-the-art search algorithms.

database and expert systems applications | 2003

Similarity Join in Metric Spaces Using eD-Index

Vlastislav Dohnal; Claudio Gennaro; Pavel Zezula

Similarity join in distance spaces constrained by the metric postulates is the necessary complement of more famous similarity range and the nearest neighbor search primitives. However, the quadratic computational complexity of similarity joins prevents from applications on large data collections. We present the eD-Index, an extension of D-index, and we study an application of the eD-Index to implement two algorithms for similarity self joins, i.e. the range query join and the overloading join. Though also these approaches are not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.

Explore More