Vladimir Pestov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vladimir Pestov is active.

Explore More

Publication

Featured researches published by Vladimir Pestov.

international symposium on neural networks | 2007

Intrinsic dimension of a dataset: what properties does one expect?

Vladimir Pestov

We propose an axiomatic approach to the concept of an intrinsic dimension of a dataset, based on a viewpoint of geometry of high-dimensional structures. Our first axiom postulates that high values of dimension be indicative of the presence of the curse of dimensionality (in a certain precise mathematical sense). The second axiom requires the dimension to depend smoothly on a distance between datasets (so that the dimension of a dataset and that of an approximating principal manifold would be close to each other). The third axiom is a normalization condition: the dimension of the Euclidean n-sphere Sn is Theta(n). We give an example of a dimension function satisfying our axioms, even though it is in general computationally unfeasible, and discuss a computationally cheap function satisfying most but not all of our axioms (the intrinsic dimensionality of Chavez et al.)

Computers & Mathematics With Applications | 2013

Is the k-NN classifier in high dimensions affected by the curse of dimensionality?

Vladimir Pestov

Abstract There is an increasing body of evidence suggesting that exact nearest neighbour search in high-dimensional spaces is affected by the curse of dimensionality at a fundamental level. Does it necessarily mean that the same is true for k nearest neighbours based learning algorithms such as the k -NN classifier? We analyse this question at a number of levels and show that the answer is different at each of them. As our first main observation, we show the consistency of a k approximate nearest neighbour classifier. However, the performance of the classifier in very high dimensions is provably unstable. As our second main observation, we point out that the existing model for statistical learning is oblivious of dimension of the domain and so every learning problem admits a universally consistent deterministic reduction to the one-dimensional case by means of a Borel isomorphism.

similarity search and applications | 2009

Curse of Dimensionality in Pivot Based Indexes

Ilya Volnyansky; Vladimir Pestov

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples picked in i.i.d. fashion from a sequence of metric spaces. We allow the size of the dataset to grow in relation to dimension, such that the dimension is superlogarithmic but subpolynomial in the size of the dataset. The number of pivots is sublinear in the size of the dataset. We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces in the sense of concentration of measure phenomenon is linear in dimension, then the performance of similarity search pivot-based indexes is asymptotically linear in the size of the dataset.

Information Systems | 2007

Indexing schemes for similarity search in datasets of short protein fragments

Aleksandar Stojmirović; Vladimir Pestov

We propose a family of very efficient hierarchical indexing schemes for ungapped, score matrix-based similarity search in large datasets of short (4-12 amino acid) protein fragments. This type of similarity search has importance in both providing a building block to more complex algorithms and for possible use in direct biological investigations where datasets are of the order of 60 million objects. Our scheme is based on the internal geometry of the amino acid alphabet and performs exceptionally well, for example outputting 100 nearest neighbours to any possible fragment of length 10 after scanning on average less than 1% of the entire dataset.

Journal of Discrete Algorithms | 2012

Indexability, concentration, and VC theory

Vladimir Pestov

Degrading performance of indexing schemes for exact similarity search in high dimensions has long since been linked to histograms of distributions of distances and other 1-Lipschitz functions getting concentrated. We discuss this observation in the framework of the phenomenon of concentration of measure on the structures of high dimension and the Vapnik-Chervonenkis theory of statistical learning.

granular computing | 2010

Predictive PAC Learnability: A Paradigm for Learning from Exchangeable Input Data

Vladimir Pestov

Exchangeable random variables form an important and well-studied generalization of i.i.d. variables, however simple examples show that no nontrivial concept or function classes are PAC learnable under general exchangeable data inputs

Algorithmica | 2013

Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Vladimir Pestov

X_1, X_2, ldots

algorithmic learning theory | 2010

PAC learnability of a concept class under non-atomic measures: a problem by Vidyasagar

Vladimir Pestov

. Inspired by the work of Berti and Rigo on a Glivenko–Cantelli theorem for exchangeable inputs, we propose a new paradigm, adequate for learning from exchangeable data: predictive PAC learnability. A learning rule

similarity search and applications | 2013