Santosh Vempala | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Santosh Vempala is active.

Explore More

Publication

Featured researches published by Santosh Vempala.

symposium on principles of database systems | 1998

Latent semantic indexing: a probabilistic analysis

Christos H. Papadimitriou; Hisao Tamaki; Prabhakar Raghavan; Santosh Vempala

Latent semantic indexing LSI is an information retrieval technique based on the spectral analysis of the term document matrix whose empirical success had heretofore been without rigorous prediction and explanation We prove that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance We also propose the technique of random projection as a way of speeding up LSI We complement our theorems with encouraging experimental results We also argue that our results may be viewed in a more general framework as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative ltering

Journal of Computer and System Sciences | 2005

Efficient algorithms for online decision problems

Adam Tauman Kalai; Santosh Vempala

In an online decision problem, one makes a sequence of decisions without knowledge of the future. Each period, one pays a cost based on the decision and observed state. We give a simple approach for doing nearly as well as the best single decision, where the best is chosen with the benefit of hindsight. A natural idea is to follow the leader, i.e. each period choose the decision which has done best so far. We show that by slightly perturbing the totals and then choosing the best decision, the expected performance is nearly as good as the best decision in hindsight. Our approach, which is very much like Hannans original game-theoretic approach from the 1950s, yields guarantees competitive with the more modern exponential weighting algorithms like Weighted Majority. More importantly, these follow-the-leader style algorithms extend naturally to a large class of structured online problems for which the exponential algorithms are inefficient.

Machine Learning | 2004

Clustering Large Graphs via the Singular Value Decomposition

Petros Drineas; Alan M. Frieze; Ravi Kannan; Santosh Vempala; V. Vinay

We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m × n matrix A that represents the m points; this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right.Finally, we show that the SVD of a random submatrix—chosen according to a suitable probability distribution—of a given matrix provides an approximation to the SVD of the whole matrix, thus yielding a very fast randomized algorithm. We expect this algorithm to be the main contribution of this paper, since it can be applied to problems of very large size which typically arise in modern applications.

symposium on principles of database systems | 2000

Latent Semantic Indexing

Christos H. Papadimitriou; Prabhakar Raghavan; Hisao Tamaki; Santosh Vempala

Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.

foundations of computer science | 1999

An algorithmic theory of learning: robust concepts and random projection

Rosa I. Arriaga; Santosh Vempala

We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel algorithmic analysis via a model of robust concept learning (closely related to “margin classifiers”), and show that a relatively small number of examples are sufficient to learn rich concept classes. The new algorithms have several advantages—they are faster, conceptually simpler, and resistant to low levels of noise. For example, a robust half-space can be learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that “more robust concepts are easier to learn”, is supported by a multitude of psychological studies.

symposium on discrete algorithms | 2006

Matrix approximation and projective clustering via volume sampling

Amit Deshpande; Luis Rademacher; Santosh Vempala; Grant Wang

Frieze et al. [17] proved that a small sample of rows of a given matrix A contains a low-rank approximation D that minimizes ||A - D||F to within small additive error, and the sampling can be done efficiently using just two passes over the matrix [12]. In this paper, we generalize this result in two ways. First, we prove that the additive error drops exponentially by iterating the sampling in an adaptive manner. Using this result, we give a pass-efficient algorithm for computing low-rank approximation with reduced additive error. Our second result is that using a natural distribution on subsets of rows (called volume sampling), there exists a subset of k rows whose span contains a factor (k + 1) relative approximation and a subset of k + k(k + 1)/e rows whose span contains a 1+e relative approximation. The existence of such a small certificate for multiplicative low-rank approximation leads to a PTAS for the following projective clustering problem: Given a set of points P in Rd, and integers k, j, find a set of j subspaces F 1 , . . ., F j , each of dimension at most k, that minimize Σ p∈P min i d(p, F i )2.

symposium on the theory of computing | 2004

Hit-and-run from a corner

László Lovász; Santosh Vempala

We show that the hit-and-run random walk mixes rapidly starting from any interior point of a convex body. This is the first random walk known to have this property. In contrast, the ball walk can take exponentially many steps from some starting points.

symposium on the theory of computing | 1997

Locality-preserving hashing in multidimensional spaces

Piotr Indyk; Rajeev Motwani; Prabhakar Raghavan; Santosh Vempala

We consider localitg-preserving hashing — in which adjacent points in the domain are mapped to adjacent or nearlyadjacent points in the range — when the domain is a ddimensional cube. This problem has applications to highdimensional search and multimedia indexing. We show that simple and natural classes of hash functions are provably good for this problem. We complement this with lower bounds suggesting that our results are essentially the best possible.

international workshop and international workshop on approximation randomization and combinatorial optimization algorithms and techniques | 2006

Adaptive sampling and fast low-rank matrix approximation

Amit Deshpande; Santosh Vempala

We prove that any real matrix A contains a subset of at most 4k/e+ 2k log(k+1) rows whose span “contains” a matrix of rank at most k with error only (1+e) times the error of the best rank-k approximation of A. We complement it with an almost matching lower bound by constructing matrices where the span of any k/2e rows does not “contain” a relative (1+e)-approximation of rank k. Our existence result leads to an algorithm that finds such rank-k approximation in time

SIAM Journal on Computing | 2008

The Spectral Method for General Mixture Models

Ravindran Kannan; Hadi Salmasian; Santosh Vempala

O \left( M \left( \frac{k}{\epsilon} + k^{2} \log k \right) + (m+n) \left( \frac{k^{2}}{\epsilon^{2}} + \frac{k^{3} \log k}{\epsilon} + k^{4} \log^{2} k \right) \right),

Explore More