Is this you? Create Your Porfile

Jerry Li

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jerry Li is active.

Explore More

Publication

Featured researches published by Jerry Li.

symposium on the theory of computing | 2018

Mixture models, robustness, and sum of squares proofs

Samuel B. Hopkins; Jerry Li

We use the Sum of Squares method to develop new efficient algorithms for learning well-separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that substantially improve upon the statistical guarantees achieved by previous efficient algorithms. Our contributions are: Mixture models with separated means: We study mixtures of poly(k)-many k-dimensional distributions where the means of every pair of distributions are separated by at least kε. In the special case of spherical Gaussian mixtures, we give a kO(1/ε)-time algorithm that learns the means assuming separation at least kε, for any ε> 0. This is the first algorithm to improve on greedy (“single-linkage”) and spectral clustering, breaking a long-standing barrier for efficient algorithms at separation k1/4. Robust estimation: When an unknown (1−ε)-fraction of X1,…,Xn are chosen from a sub-Gaussian distribution with mean µ but the remaining points are chosen adversarially, we give an algorithm recovering µ to error ε1−1/t in time kO(t), so long as sub-Gaussian-ness up to O(t) moments can be certified by a Sum of Squares proof. This is the first polynomial-time algorithm with guarantees approaching the information-theoretic limit for non-Gaussian distributions. Previous algorithms could not achieve error better than ε1/2. As a corollary, we achieve similar results for robust covariance estimation. Both of these results are based on a unified technique. Inspired by recent algorithms of Diakonikolas et al. in robust statistics, we devise an SDP based on the Sum of Squares method for the following setting: given X1,…,Xn ∈ ℝk for large k and n = poly(k) with the promise that a subset of X1,…,Xn were sampled from a probability distribution with bounded moments, recover some information about that distribution.

ACM Transactions on Database Systems | 2017

Exact Model Counting of Query Expressions: Limitations of Propositional Methods

Paul Beame; Jerry Li; Sudeepa Roy; Dan Suciu

We prove exponential lower bounds on the running time of the state-of-the-art exact model counting algorithms—algorithms for exactly computing the number of satisfying assignments, or the satisfying probability, of Boolean formulas. These algorithms can be seen, either directly or indirectly, as building Decision-Decomposable Negation Normal Form (decision-DNNF) representations of the input Boolean formulas. Decision-DNNFs are a special case of d-DNNFs where d stands for deterministic. We show that any knowledge compilation representations from a class (called DLDDs in this article) that contain decision-DNNFs can be converted into equivalent Free Binary Decision Diagrams (FBDDs), also known as Read-Once Branching Programs, with only a quasi-polynomial increase in representation size. Leveraging known exponential lower bounds for FBDDs, we then obtain similar exponential lower bounds for decision-DNNFs, which imply exponential lower bounds for model-counting algorithms. We also separate the power of decision-DNNFs from d-DNNFs and a generalization of decision-DNNFs known as AND-FBDDs. We then prove new lower bounds for FBDDs that yield exponential lower bounds on the running time of these exact model counters when applied to the problem of query evaluation in tuple-independent probabilistic databases—computing the probability of an answer to a query given independent probabilities of the individual tuples in a database instance. This approach to the query evaluation problem, in which one first obtains the lineage for the query and database instance as a Boolean formula and then performs weighted model counting on the lineage, is known as grounded inference. A second approach, known as lifted inference or extensional query evaluation, exploits the high-level structure of the query as a first-order formula. Although it has been widely believed that lifted inference is strictly more powerful than grounded inference on the lineage alone, no formal separation has previously been shown for query evaluation. In this article, we show such a formal separation for the first time. In particular, we exhibit a family of database queries for which polynomial-time extensional query evaluation techniques were previously known but for which query evaluation via grounded inference using the state-of-the-art exact model counters requires exponential time.

international colloquium on automata, languages and programming | 2015

Replacing Mark Bits with Randomness in Fibonacci Heaps

Jerry Li; John Peebles

A Fibonacci heap is a deterministic data structure implementing a priority queue with optimal amortized operation costs. An unfortunate aspect of Fibonacci heaps is that they must maintain a “mark bit” which serves only to ensure efficiency of heap operations, not correctness. Karger proposed a simple randomized variant of Fibonacci heaps in which mark bits are replaced by coin flips. This variant still has expected amortized cost O(1) for insert, decrease-key, and merge. Karger conjectured that this data structure has expected amortized cost \(O(\log s)\) for delete-min, where s is the number of heap operations.

uncertainty in artificial intelligence | 2013