David P. Woodruff | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David P. Woodruff is active.

Explore More

Publication

Featured researches published by David P. Woodruff.

symposium on principles of database systems | 2010

An optimal algorithm for the distinct elements problem

Daniel M. Kane; Jelani Nelson; David P. Woodruff

We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, network topology, and data mining. For a stream of indices in {1,...,n}, our algorithm computes a (1 ± ε)-approximation using an optimal O(1/ε-2 + log(n)) bits of space with 2/3 success probability, where 0<ε<1 is given. This probability can be amplified by independent repetition. Furthermore, our algorithm processes each stream update in O(1) worst-case time, and can report an estimate at any point midstream in O(1) worst-case time, thus settling both the space and time complexities simultaneously. We also give an algorithm to estimate the Hamming norm of a stream, a generalization of the number of distinct elements, which is useful in data cleaning, packet tracing, and database auditing. Our algorithm uses nearly optimal space, and has optimal O(1) update and reporting times.

symposium on the theory of computing | 2005

Optimal approximations of the frequency moments of data streams

Piotr Indyk; David P. Woodruff

We give a 1-pass Õ(m1-2⁄k)-space algorithm for computing the k-th frequency moment of a data stream for any real k > 2. Together with the lower bounds of [1, 2, 4], this resolves the main problem left open by Alon et al in 1996 [1]. Our algorithm also works for streams with deletions and thus gives an Õ(m 1-2⁄p) space algorithm for the L<inf>p</inf> difference problem for any p > 2. This essentially matches the known Ω(m1-2⁄p-o(1)) lower bound of [12, 2]. Finally the update time of our algorithms is Õ(1).

Foundations and Trends in Theoretical Computer Science | 2014

Sketching as a Tool for Numerical Linear Algebra

David P. Woodruff

This survey highlights the recent advances in algorithms for numericallinear algebra that have come from the technique of linear sketching,whereby given a matrix, one first compresses it to a much smaller matrixby multiplying it by a (usually) random matrix with certain properties.Much of the expensive computation can then be performed onthe smaller matrix, thereby accelerating the solution for the originalproblem. In this survey we consider least squares as well as robust regressionproblems, low rank approximation, and graph sparsification.We also discuss a number of variants of these problems. Finally, wediscuss the limitations of sketching methods.This survey highlights the recent advances in algorithms for numericallinear algebra that have come from the technique of linear sketching,whereby given a matrix, one first compresses it to a much ...

foundations of computer science | 2003

Tight lower bounds for the distinct elements problem

Piotr Indyk; David P. Woodruff

We prove strong lower bounds for the space complexity of (/spl epsi/, /spl delta/)-approximating the number of distinct elements F/sub 0/ in a data stream. Let m be the size of the universe from which the stream elements are drawn. We show that any one-pass streaming algorithm for (/spl epsi/, /spl delta/)-approximating F/sub 0/ must use /spl Omega/(1//spl epsi//sup 2/) space when /spl epsi/ = /spl Omega/(m/sup -1/(9 + k)/), for any k > 0, improving upon the known lower bound of /spl Omega/(1//spl epsi/) for this range of /spl epsi/. This lower bound is tight up to a factor of log log m for small /spl epsi/ and log 1//spl epsi/ for large /spl epsi/. Our lower bound is derived from a reduction from the one-way communication complexity of approximating a Boolean function in Euclidean space. The reduction makes use of a low-distortion embedding from an l/sub 2/ to l/sub 1/ norm.

symposium on the theory of computing | 2012

Tight bounds for distributed functional monitoring

David P. Woodruff; Qin Zhang

We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there are k sites each tracking their input streams and communicating with a central coordinator. The coordinators task is to continuously maintain an approximate output to a function computed over the union of the k streams. The goal is to minimize the number of bits communicated. Let the p-th frequency moment be defined as Fp = ∑i fip, where fi is the frequency of element i. We show the randomized communication complexity of estimating the number of distinct elements (that is, F0) up to a 1+ε factor is Ω(k/ε2), improving upon the previous Ω(k + 1/ε2) bound and matching known upper bounds. For Fp, p > 1, we improve the previous Ω(k + 1/ε2) communication bound to Ω(kp-1/ε2). We obtain similar improvements for heavy hitters, empirical entropy, and other problems. Our lower bounds are the first of any kind in distributed functional monitoring to depend on the product of k and 1/ε2. Moreover, the lower bounds are for the static version of the distributed functional monitoring model where the coordinator only needs to compute the function at the time when all k input streams end; surprisingly they almost match what is achievable in the (dynamic version of) distributed functional monitoring model where the coordinator needs to keep track of the function continuously at any time step. We also show that we can estimate Fp, for any p > 1, using O(kp-1 poly(ε-1)) communication. This drastically improves upon the previous O(k2p+1N1-2/p poly(ε-1)) bound of Cormode, Muthukrishnan, and Yi for general p, and their O(k2/ε + k1.5/ε3) bound for p = 2. For p = 2, our bound resolves their main open question. Our lower bounds are based on new direct sum theorems for approximate majority, and yield improvements to classical problems in the standard data stream model. First, we improve the known lower bound for estimating Fp, p > 2, in t passes from Ω(n1-2/p/(ε2/p t)) to Ω(n1-2/p/(ε4/p t)), giving the first bound that matches what we expect when p = 2 for any constant number of passes. Second, we give the first lower bound for estimating F0 in t passes with Ω(1/(ε2 t)) bits of space that does not use the hardness of the gap-hamming problem.

theory of cryptography conference | 2006

Polylogarithmic private approximations and efficient matching

Piotr Indyk; David P. Woodruff

In [12] a private approximation of a function f is defined to be another function F that approximates f in the usual sense, but does not reveal any information about x other than what can be deduced from f(x). We give the first two-party private approximation of the l2 distance with polylogarithmic communication. This, in particular, resolves the main open question of [12]. We then look at the private near neighbor problem in which Alice has a query point in {0,1}d and Bob a set of n points in {0,1}d, and Alice should privately learn the point closest to her query. We improve upon existing protocols, resolving open questions of [13,10]. Then, we relax the problem by defining the private approximate near neighbor problem, which requires introducing a notion of secure computation of approximations for functions that return sets of points rather than values. For this problem we give several protocols with sublinear communication.

symposium on discrete algorithms | 2011

Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with sub-constant error

T. S. Jayram; David P. Woodruff

The Johnson-Lindenstrauss transform is a dimensionality reduction technique with a wide range of applications to theoretical computer science. It is specified by a distribution over projection matrices from Rn → Rk where k n and states that k = O(ϵ−2 log 1/Δ) dimensions suffice to approximate the norm of any fixed vector in Rn to within a factor of 1 ± ϵ with probability at least 1 − Δ. In this article, we show that this bound on k is optimal up to a constant factor, improving upon a previous Ω((ϵ−2 log 1/Δ)/log(1/ϵ)) dimension bound of Alon. Our techniques are based on lower bounding the information cost of a novel one-way communication game and yield the first space lower bounds in a data stream model that depend on the error probability Δ. For many streaming problems, the most naïve way of achieving error probability Δ is to first achieve constant probability, then take the median of O(log 1/Δ) independent repetitions. Our techniques show that for a wide range of problems, this is in fact optimal! As an example, we show that estimating the ℓp-distance for any p ∈ [0,2] requires Ω(ϵ−2 log n log 1/Δ) space, even for vectors in {0,1}n. This is optimal in all parameters and closes a long line of work on this problem. We also show the number of distinct elements requires Ω(ϵ−2 log 1/Δ + log n) space, which is optimal if ϵ−2 = Ω(log n). We also improve previous lower bounds for entropy in the strict turnstile and general turnstile models by a multiplicative factor of Ω(log 1/Δ). Finally, we give an application to one-way communication complexity under product distributions, showing that, unlike the case of constant Δ, the VC-dimension does not characterize the complexity when Δ = o(1).

Journal of the ACM | 2012

Sublinear optimization for machine learning

Kenneth L. Clarkson; Elad Hazan; David P. Woodruff

We give sub linear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, such as SVDD, hard margin SVM, and

foundations of computer science | 2006

Lower Bounds for Additive Spanners, Emulators, and More

David P. Woodruff

L_2

conference on computational complexity | 2005

A geometric approach to information-theoretic private information retrieval

David P. Woodruff; Sergey Yekhanin

-SVM, for which sub linear-time algorithms were not known before. These new algorithms use a combination of a novel sampling techniques and a new multiplicative update algorithm. We give lower bounds which show the running times of many of our algorithms to be nearly best possible in the unit-cost RAM model. We also give implementations of our algorithms in the semi-streaming setting, obtaining the first low pass polylogarithmic space and sub linear time algorithms achieving arbitrary approximation factor.

Explore More