Is this you? Create Your Porfile

Sanjoy Dasgupta

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sanjoy Dasgupta is active.

Explore More

Publication

Featured researches published by Sanjoy Dasgupta.

Random Structures and Algorithms | 2003

An elementary proof of a theorem of Johnson and Lindenstrauss

Sanjoy Dasgupta; Anupam Gupta

A result of Johnson and Lindenstrauss [13] shows that a set of n points in high dimensional Euclidean space can be mapped into an O(log n/e2)-dimensional Euclidean space such that the distance between any two points changes by only a factor of (1 ± e). In this note, we prove this theorem using elementary probabilistic techniques.

foundations of computer science | 1999

Learning mixtures of Gaussians

Sanjoy Dasgupta

Mixtures of Gaussians are among the most fundamental and widely used statistical models. Current techniques for learning such mixtures from data are local search heuristics with weak performance guarantees. We present the first provably correct algorithm for learning a mixture of Gaussians. This algorithm is very simple and returns the true centers of the Gaussians to within the precision specified by the user with high probability. It runs in time only linear in the dimension of the data and polynomial in the number of Gaussians.

international conference on machine learning | 2008

Hierarchical sampling for active learning

Sanjoy Dasgupta; Daniel J. Hsu

We present an active learning scheme that exploits cluster structure in data.

symposium on the theory of computing | 2008

Random projection trees and low dimensional manifolds

Sanjoy Dasgupta; Yoav Freund

We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data without having to explicitly learn this structure.

international conference on machine learning | 2009

Importance weighted active learning

Alina Beygelzimer; Sanjoy Dasgupta; John Langford

We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process.

Journal of Computer and System Sciences | 2005

Performance guarantees for hierarchical clustering

Sanjoy Dasgupta; Philip M. Long

We show that for any data set in any metric space, it is possible to construct a hierarchical clustering with the guarantee that for every k, the induced k-clustering has cost at most eight times that of the optimal k-clustering. Here the cost of a clustering is taken to be the maximum radius of its clusters. Our algorithm is similar in simplicity and efficiency to popular agglomerative heuristics for hierarchical clustering, and we show that these heuristics have unbounded approximation factors.

conference on learning theory | 2009

Analysis of Perceptron-Based Active Learning

Sanjoy Dasgupta; Adam Tauman Kalai; Claire Monteleoni

We start by showing that in an active learning setting, the Perceptron algorithm needs Ω(1/e2) labels to learn linear separators within generalization error e. We then present a simple active learning algorithm for this problem, which combines a modification of the Perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error e after asking for just O(d log 1/e) labels. This exponential improvement over the usual sample complexity of supervised learning had previously been demonstrated only for the computationally more complex query-by-committee algorithm.

Theoretical Computer Science | 2011

Two faces of active learning

Sanjoy Dasgupta

An active learner has a collection of data points, each with a label that is initially hidden but can be obtained at some cost. Without spending too much, it wishes to find a classifier that will accurately map points to labels. There are two common intuitions about how this learning process should be organized: (i) by choosing query points that shrink the space of candidate classifiers as rapidly as possible; and (ii) by exploiting natural clusters in the (unlabeled) data set. Recent research has yielded learning algorithms for both paradigms that are efficient, work with generic hypothesis classes, and have rigorously characterized labeling requirements. Here we survey these advances by focusing on two representative algorithms and discussing their mathematical properties and empirical performance.

allerton conference on communication, control, and computing | 2008

Random projection trees for vector quantization

Sanjoy Dasgupta; Yoav Freund

A simple and computationally efficient scheme for tree-structured vector quantization is presented. Unlike previous methods, its quantization error depends only on the intrinsic dimension of the data distribution, rather than the apparent dimension of the space in which the data happen to lie.

SIAM Journal on Computing | 2005

The Complexity of Approximating the Entropy

Tuugkan Batu; Sanjoy Dasgupta; Ravi Kumar; Ronitt Rubinfeld

We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a

Explore More