Is this you? Create Your Porfile

Ankur Moitra

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ankur Moitra is active.

Explore More

Publication

Featured researches published by Ankur Moitra.

symposium on the theory of computing | 2012

Computing a nonnegative matrix factorization -- provably

Sanjeev Arora; Rong Ge; Ravindran Kannan; Ankur Moitra

The Nonnegative Matrix Factorization (NMF) problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where the factorization is computed using a variety of local search heuristics. Vavasis recently proved that this problem is NP-complete. We initiate a study of when this problem is solvable in polynomial time. Consider a nonnegative m x n matrix

foundations of computer science | 2012

Learning Topic Models -- Going beyond SVD

Sanjeev Arora; Rong Ge; Ankur Moitra

foundations of computer science | 2010

Settling the Polynomial Learnability of Mixtures of Gaussians

Ankur Moitra; Gregory Valiant

and a target inner-dimension r. Our results are the following: - We give a polynomial-time algorithm for exact and approximate NMF for every constant r. Indeed NMF is most interesting in applications precisely when r is small. We complement this with a hardness result, that if exact NMF can be solved in time (nm)o(r), 3-SAT has a sub-exponential time algorithm. Hence, substantial improvements to the above algorithm are unlikely. - We give an algorithm that runs in time polynomial in n, m and r under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first polynomial-time algorithm that provably works under a non-trivial condition on the input matrix and we believe that this will be an interesting and important direction for future work.

foundations of computer science | 2008

Some Results on Greedy Embeddings in Metric Spaces

Ankur Moitra; Tom Leighton

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational works both in machine learning and in theory have suggested a probabilistic model for documents, whereby documents arise as a convex combination of (i.e. distribution on) a small number of topic vectors, each topic vector being a distribution on words (i.e. a vector of word-frequencies). Similar models have since been used in a variety of application areas, the Latent Dirichlet Allocation or LDA model of Blei et al. is especially popular. Theoretical studies of topic modeling focus on learning the models parameters assuming the data is actually generated from it. Existing approaches for the most part rely on Singular Value Decomposition (SVD), and consequently have one of two limitations: these works need to either assume that each document contains only one topic, or else can only recover the {\em span} of the topic vectors instead of the topic vectors themselves. This paper formally justifies Nonnegative Matrix Factorization (NMF) as a main tool in this context, which is an analog of SVD where all vectors are nonnegative. Using this tool we give the first polynomial-time algorithm for learning topic models without the above two limitations. The algorithm uses a fairly mild assumption about the underlying topic matrix called separability, which is usually found to hold in real-life data. Perhaps the most attractive feature of our algorithm is that it generalizes to yet more realistic models that incorporate topic-topic correlations, such as the Correlated Topic Model (CTM) and the Pachinko Allocation Model (PAM). We hope that this paper will motivate further theoretical results that use NMF as a replacement for SVD -- just as NMF has come to replace SVD in many applications.

symposium on the theory of computing | 2010

Efficiently learning mixtures of two Gaussians

Adam Tauman Kalai; Ankur Moitra; Gregory Valiant

Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We give an algorithm for this problem that has running time and data requirements polynomial in the dimension and the inverse of the desired accuracy, with provably minimal assumptions on the Gaussians. As a simple consequence of our learning algorithm, we we give the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture. It was open whether proper density estimation was even statistically possible (with no assumptions) given only polynomially many samples, let alone whether it could be computationally efficient. The building blocks of our algorithm are based on the work (Kalai \emph{et al}, STOC 2010) that gives an efficient algorithm for learning mixtures of two Gaussians by considering a series of projections down to one dimension, and applying the method of moments to each univariate projection. A major technical hurdle in the previous work is showing that one can efficiently learn univariate mixtures of two Gaussians. In contrast, because pathological scenarios can arise when considering projections of mixtures of more than two Gaussians, the bulk of the work in this paper concerns how to leverage a weaker algorithm for learning univariate mixtures (of many Gaussians) to learn in high dimensions. Our algorithm employs hierarchical clustering and rescaling, together with methods for backtracking and recovering from the failures that can occur in our univariate algorithm. Finally, while the running time and data requirements of our algorithm depend exponentially on the number of Gaussians in the mixture, we prove that such a dependence is necessary.

symposium on the theory of computing | 2013

An information complexity approach to extended formulations

Mark Braverman; Ankur Moitra

Geographic routing is a family of routing algorithms that uses geographic point locations as addresses for the purposes of routing. Such routing algorithms have proven to be both simple to implement and heuristically effective when applied to wireless sensor networks. Greedy routing is a natural abstraction of this model in which nodes are assigned virtual coordinates in a metric space, and these coordinates are used to perform point-to-point routing. Here we resolve a conjecture of Papadimitriou and Ratajczak that every 3-connected planar graph admits a greedy embedding into the Euclidean plane. This immediately implies that all 3-connected graphs that exclude K3.3 as a minor admit a greedy embedding into the Euclidean plane. Additionally, we provide the first non-trivial examples of graphs that admit no such embedding. These structural results provide efficiently verifiable certificates that a graph admits a greedy embedding or that a graph admits no greedy embedding into the Euclidean plane.

foundations of computer science | 2011

Efficient and Explicit Coding for Interactive Communication

Ran Gelles; Ankur Moitra; Amit Sahai

Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We provide a polynomial-time algorithm for this problem for the case of two Gaussians in

foundations of computer science | 2009

Approximation Algorithms for Multicommodity-Type Problems with Guarantees Independent of the Graph Size

Ankur Moitra

neural information processing systems | 2012

Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

Sanjeev Arora; Rong Ge; Ankur Moitra; Sushant Sachdeva

dimensions (even if they overlap), with provably minimal assumptions on the Gaussians, and polynomial data requirements. In statistical terms, our estimator converges at an inverse polynomial rate, and no such estimator (even exponential time) was known for this problem (even in one dimension). Our algorithm reduces the n-dimensional problem to the one-dimensional problem, where the method of moments is applied. One technical challenge is proving that noisy estimates of the first six moments of a univariate mixture suffice to recover accurate estimates of the mixture parameters, as conjectured by Pearson (1894), and in fact these estimates converge at an inverse polynomial rate. As a corollary, we can efficiently perform near-optimal clustering: in the case where the overlap between the Gaussians is small, one can accurately cluster the data, and when the Gaussians have partial overlap, one can still accurately cluster those data points which are not in the overlap region. A second consequence is a polynomial-time density estimation algorithm for arbitrary mixtures of two Gaussians, generalizing previous work on axis-aligned Gaussians (Feldman {\em et al}, 2006).

symposium on the theory of computing | 2015

Super-resolution, Extremal Functions and the Condition Number of Vandermonde Matrices

Ankur Moitra

We prove an unconditional lower bound that any linear program that achieves an O(n1-ε) approximation for clique has size 2Ω(nε). There has been considerable recent interest in proving unconditional lower bounds against any linear program. Fiorini et al. proved that there is no polynomial sized linear program for traveling salesman. Braun et al. proved that there is no polynomial sized O(n1/2 - ε)-approximate linear program for clique. Here we prove an optimal and unconditional lower bound against linear programs for clique that matches Hastads celebrated hardness result. Interestingly, the techniques used to prove such lower bounds have closely followed the progression of techniques used in communication complexity. Here we develop an information theoretic framework to approach these questions, and we use it to prove our main result. Also we resolve a related question: How many bits of communication are needed to get ε-advantage over random guessing for disjointness? Kalyanasundaram and Schnitger proved that a protocol that gets constant advantage requires Ω(n) bits of communication. This result in conjunction with amplification implies that any protocol that gets ε-advantage requires Ω(ε2 n) bits of communication. Here we improve this bound to Ω(ε n), which is optimal for any ε > 0.

Explore More