Piotr Indyk
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Piotr Indyk.
Communications of The ACM | 2008
Alexandr Andoni; Piotr Indyk
We present an algorithm for the c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn 1c2/+o(1)) and space O(dn + n1+1c2/+o(1)). This almost matches the lower bound for hashing-based algorithm recently obtained in (R. Motwani et al., 2006). We also obtain a space-efficient version of the algorithm, which uses dn+n logO(1) n space, with a query time of dnO(1/c2). Finally, we discuss practical variants of the algorithms that utilize fast bounded-distance decoders for the Leech lattice
international conference on management of data | 1998
Soumen Chakrabarti; Byron Dom; Piotr Indyk
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain high-quality semantic clues that are lost upon a purely term-based classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented with pre-classified samples from Yahoo!1 and the US Patent Database2. In previous work, we developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained. This classifier misclassified 36% of the patents, indicating that classifying hypertext can be more difficult than classifying text. Naively using terms in neighboring documents increased error to 38%; our hypertext classifier reduced it to 21%. Results with the Yahoo! sample were more dramatic: the text classifier showed 68% error, whereas our hypertext classifier reduced this to only 21%.
SIAM Journal on Computing | 2002
Mayur Datar; Aristides Gionis; Piotr Indyk; Rajeev Motwani
We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic problem: Given a stream of bits, maintain a count of the number of 1s in the last N elements seen from the stream. We show that, using
foundations of computer science | 2006
Alexandr Andoni; Piotr Indyk
O(\frac{1}{\epsilon} \log^2 N)
Journal of the ACM | 2006
Piotr Indyk
bits of memory, we can estimate the number of 1s to within a factor of
allerton conference on communication, control, and computing | 2008
Radu Berinde; Anna C. Gilbert; Piotr Indyk; Howard J. Karloff; M. Strauss
1 + \epsilon
symposium on the theory of computing | 2002
Mihai Bādoiu; Sariel Har-Peled; Piotr Indyk
. We also give a matching lower bound of
Proceedings of the IEEE | 2010
Anna C. Gilbert; Piotr Indyk
\Omega(\frac{1}{\epsilon}\log^2 N)
international conference on cluster computing | 2001
Piotr Indyk
memory bits for any deterministic or randomized algorithms. We extend our scheme to maintain the sum of the last N positive integers and provide matching upper and lower bounds for this more general problem as well. We also show how to efficiently compute the Lp norms (
symposium on the theory of computing | 2002
Anna C. Gilbert; Sudipto Guha; Piotr Indyk; S. Muthukrishnan; M. Strauss
p \in [1,2]