Jelani Nelson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jelani Nelson is active.

Explore More

Publication

Featured researches published by Jelani Nelson.

symposium on principles of database systems | 2010

An optimal algorithm for the distinct elements problem

Daniel M. Kane; Jelani Nelson; David P. Woodruff

We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, network topology, and data mining. For a stream of indices in {1,...,n}, our algorithm computes a (1 ± ε)-approximation using an optimal O(1/ε-2 + log(n)) bits of space with 2/3 success probability, where 0<ε<1 is given. This probability can be amplified by independent repetition. Furthermore, our algorithm processes each stream update in O(1) worst-case time, and can report an estimate at any point midstream in O(1) worst-case time, thus settling both the space and time complexities simultaneously. We also give an algorithm to estimate the Hamming norm of a stream, a generalization of the number of distinct elements, which is useful in data cleaning, packet tracing, and database auditing. Our algorithm uses nearly optimal space, and has optimal O(1) update and reporting times.

foundations of computer science | 2013

OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings

Jelani Nelson; Huy L. Nguyen

An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ Rm×n such that for any linear subspace W ⊆ Rn with dim(W) = d, PΠ~D(∀x ∈ W ||Πx||2 ∈ (1 ± ε)||x||2) > 2/3. We show that a certain class of distributions, Oblivious Sparse Norm-Approximating Projections (OSNAPs), provides OSEs with m = O(d1+γ/ε2), and where every matrix Π in the support of the OSE has only s = Oγ(1/ε) non-zero entries per column, for γ > 0 any desired constant. Plugging OSNAPs into known algorithms for approximate least squares regression, ℓp regression, low rank approximation, and approximating leverage scores implies faster algorithms for all these problems. Our main result is essentially a Bai-Yin type theorem in random matrix theory and is likely to be of independent interest: we show that for any fixed U ∈ Rn×d with orthonormal columns and random sparse Π, all singular values of ΠU lie in [1 - ε, 1 + ε] with good probability. This can be seen as a generalization of the sparse Johnson-Lindenstrauss lemma, which was concerned with d = 1. Our methods also recover a slightly sharper version of a main result of [Clarkson-Woodruff, STOC 2013], with a much simpler proof. That is, we show that OSNAPs give an OSE with m = O(d2/ε2), s = 1.

Journal of the ACM | 2014

Sparser Johnson-Lindenstrauss Transforms

Daniel M. Kane; Jelani Nelson

We give two different and simple constructions for dimensionality reduction in ℓ2 via linear mappings that are sparse: only an O(ϵ)-fraction of entries in each column of our embedding matrices are non-zero to achieve distortion 1 + ϵ with high probability, while still achieving the asymptotically optimal number of rows. These are the first constructions to provide subconstant sparsity for all values of parameters, improving upon previous works of Achlioptas [2003] and Dasgupta et al. [2010]. Such distributions can be used to speed up applications where ℓ2 dimensionality reduction is used.

acm symposium on parallel algorithms and architectures | 2007

Cache-oblivious streaming B-trees

Michael A. Bender; Jeremy T. Fineman; Yonatan R. Fogel; Bradley C. Kuszmaul; Jelani Nelson

A streaming B-tree is a dictionary that efficiently implements insertions and range queries. We present two cache-oblivious streaming B-trees, the shuttle tree, and the cache-oblivious lookahead array (COLA). For block-transfer size B and on N elements, the shuttle tree implements searches in optimal O(log B+1N) transfers, range queries of L successive elements in optimal O(log B+1N +L/B) transfers, and insertions in O((log B+1N)/BΘ(1/(log log B)2)+(log2N)/B) transfers, which is an asymptotic speedup over traditional B-trees if B ≥ (log N)1+c log log log2 N for any constant c >1. A COLA implements searches in O(log N) transfers, range queries in O(log N + L/B) transfers, and insertions in amortized O((log N)/B) transfers, matching the bounds for a (cache-aware) buffered repository tree. A partially deamortized COLA matches these bounds but reduces the worst-case insertion cost to O(log N) if memory size M = Ω(log N). We also present a cache-aware version of the COLA, the lookahead array, which achieves the same bounds as Brodal and Fagerbergs (cache-aware) Bε-tree. We compare our COLA implementation to a traditional B-tree. Our COLA implementation runs 790 times faster for random inser-tions, 3.1 times slower for insertions of sorted data, and 3.5 times slower for searches.

foundations of computer science | 2008

Sketching and Streaming Entropy via Approximation Theory

Nicholas Scott Harvey; Jelani Nelson; Krzysztof Onak

We give near-optimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and near-optimal bounds in the insertion-only model without sketching. Our high-level approach is simple: we give algorithms to estimate Tsallis entropy, and use them to extrapolate an estimate of Shannon entropy. The accuracy of our estimates is proven using approximation theory arguments and extremal properties of Chebyshev polynomials. Our work also yields the best-known and near-optimal additive approximations for entropy, and hence also for conditional entropy and mutual information.

foundations of computer science | 2010