Gregory Valiant | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregory Valiant is active.

Explore More

Publication

Featured researches published by Gregory Valiant.

foundations of computer science | 2010

Settling the Polynomial Learnability of Mixtures of Gaussians

Ankur Moitra; Gregory Valiant

Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We give an algorithm for this problem that has running time and data requirements polynomial in the dimension and the inverse of the desired accuracy, with provably minimal assumptions on the Gaussians. As a simple consequence of our learning algorithm, we we give the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture. It was open whether proper density estimation was even statistically possible (with no assumptions) given only polynomially many samples, let alone whether it could be computationally efficient. The building blocks of our algorithm are based on the work (Kalai \emph{et al}, STOC 2010) that gives an efficient algorithm for learning mixtures of two Gaussians by considering a series of projections down to one dimension, and applying the method of moments to each univariate projection. A major technical hurdle in the previous work is showing that one can efficiently learn univariate mixtures of two Gaussians. In contrast, because pathological scenarios can arise when considering projections of mixtures of more than two Gaussians, the bulk of the work in this paper concerns how to leverage a weaker algorithm for learning univariate mixtures (of many Gaussians) to learn in high dimensions. Our algorithm employs hierarchical clustering and rescaling, together with methods for backtracking and recovering from the failures that can occur in our univariate algorithm. Finally, while the running time and data requirements of our algorithm depend exponentially on the number of Gaussians in the mixture, we prove that such a dependence is necessary.

symposium on the theory of computing | 2010

Efficiently learning mixtures of two Gaussians

Adam Tauman Kalai; Ankur Moitra; Gregory Valiant

Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We provide a polynomial-time algorithm for this problem for the case of two Gaussians in

SIAM Journal on Computing | 2010

Designing Network Protocols for Good Equilibria

Ho-Lin Chen; Tim Roughgarden; Gregory Valiant

foundations of computer science | 2012

Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas

Gregory Valiant

dimensions (even if they overlap), with provably minimal assumptions on the Gaussians, and polynomial data requirements. In statistical terms, our estimator converges at an inverse polynomial rate, and no such estimator (even exponential time) was known for this problem (even in one dimension). Our algorithm reduces the n-dimensional problem to the one-dimensional problem, where the method of moments is applied. One technical challenge is proving that noisy estimates of the first six moments of a univariate mixture suffice to recover accurate estimates of the mixture parameters, as conjectured by Pearson (1894), and in fact these estimates converge at an inverse polynomial rate. As a corollary, we can efficiently perform near-optimal clustering: in the case where the overlap between the Gaussians is small, one can accurately cluster the data, and when the Gaussians have partial overlap, one can still accurately cluster those data points which are not in the overlap region. A second consequence is a polynomial-time density estimation algorithm for arbitrary mixtures of two Gaussians, generalizing previous work on axis-aligned Gaussians (Feldman {\em et al}, 2006).

foundations of computer science | 2011

The Power of Linear Estimators

Gregory Valiant; Paul Valiant

Designing and deploying a network protocol determines the rules by which end users interact with each other and with the network. We consider the problem of designing a protocol to optimize the equilibrium behavior of a network with selfish users. We consider network cost-sharing games, where the set of Nash equilibria depends fundamentally on the choice of an edge cost-sharing protocol. Previous research focused on the Shapley protocol, in which the cost of each edge is shared equally among its users. We systematically study the design of optimal cost-sharing protocols for undirected and directed graphs, single-sink and multicommodity networks, and different measures of the inefficiency of equilibria. Our primary technical tool is a precise characterization of the cost-sharing protocols that induce only network games with pure-strategy Nash equilibria. We use this characterization to prove, among other results, that the Shapley protocol is optimal in directed graphs and that simple priority protocols are essentially optimal in undirected graphs.

symposium on discrete algorithms | 2014

Optimal algorithms for testing closeness of discrete distributions

Siu On Chan; Ilias Diakonikolas; Gregory Valiant; Paul Valiant

Given a set of n d-dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson-correlation ρ (Hamming distance d · 1-ρ/2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constants ε, ρ >; 0 and d >;>; logn/ρ2 , finds the correlated pair with high probability, and runs in time O(n 3ω/4 + ϵ) <; O(n1.8), where w <; 2.38 is the exponent of matrix multiplication. Provided that d is sufficiently large, this runtime can be further reduced. These are the first subquadratic-time algorithms for this problem for which ρ does not appear in the exponent of n, and improves upon O(n2-O(ρ)), given by Paturi et al. [15], Locality Sensitive Hashing (LSH) [11] and the Bucketing Codes approach [6]. Applications and extensions of this basic algorithm yield improved algorithms for several other problems: ApproximateClosest Pair: For any sufficiently small constant ϵ >; 0, given n vectors in Rd, our algorithm returns a pair of vectors whose Euclidean distance differs from that of the closest pair by a factor of at most 1+ϵ, and runs in time O(n2-Θ(√ϵ)). The best previous algorithms (including LSH) have runtime O(n2-O(ϵ)). Learning Sparse Parity with Noise: Given samples from an instance of the learning parity with noise problem where each example has length n, the true parity set has size at most k <;<; n, and the noise rate is η, our algorithm identifies the set of k indices in time n ω+ϵ/3 k poly(1/1-2η) <; n0.8kpoly(1/1-2η). This is the first algorithm with no depenJence on η in the exponent of n, aside from the trivial brute-force algorithm. Learning k-Juntas with Noise: Given uniformly random length n Boolean vectors, together with a label, which is some function of just k <;<; n of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al. [7] our result for learning k-parities implies an algorithm for this problem with runtime n ω+ϵ/3 k poly(1/1-2η) <; n0.8k poly(1/1-2η), 2 which improves on the previous best of >; nk(1-2/2k)poly( 1/1-2η ), from [8]. Learning k-Juntas without Noise:1 Our results for learning sparse parities with noise imply an algorithm for learning juntas without noise with runtime n ω+ϵ/4k poly(n) <; n0.6 kpoly(n), which improves on the runtime n ω+1/ω poly(n) ≈ n0.7k poly(n) of Mossel n et al. [13].

Journal of the ACM | 2017

Estimating the Unseen: Improved Estimators for Entropy and Other Properties

Gregory Valiant; Paul Valiant

For a broad class of practically relevant distribution properties, which includes entropy and support size, nearly all of the proposed estimators have an especially simple form. Given a set of independent samples from a discrete distribution, these estimators tally the vector of summary statistics -- the number of domain elements seen once, twice, etc. in the sample -- and output the dot product between these summary statistics, and a fixed vector of coefficients. We term such estimators \emph{linear}. This historical proclivity towards linear estimators is slightly perplexing, since, despite many efforts over nearly 60 years, all proposed such estimators have significantly sub optimal convergence, compared to the bounds shown in [VV11]. Our main result, in some sense vindicating this insistence on linear estimators, is that for any property in this broad class, there exists a near-optimal linear estimator. Additionally, we give a practical and polynomial-time algorithm for constructing such estimators for any given parameters. While this result does not yield explicit bounds on the sample complexities of these estimation tasks, we leverage the insights provided by this result to give explicit constructions of near-optimal linear estimators for three properties: entropy,

algorithmic game theory | 2010