Nina Mishra | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nina Mishra is active.

Explore More

Publication

Featured researches published by Nina Mishra.

IEEE Transactions on Knowledge and Data Engineering | 2003

Clustering data streams: Theory and practice

Sudipto Guha; Adam Meyerson; Nina Mishra; Rajeev Motwani; Liadan O'Callaghan

The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithms performance on synthetic and real data streams.

international conference on data engineering | 2002

Streaming-data algorithms for high-quality clustering

Liadan O'Callaghan; Nina Mishra; Adam Meyerson; Sudipto Guha; Rajeev Motwani

Streaming data analysis has recently attracted attention in numerous applications including telephone records, Web documents and click streams. For such analysis, single-pass algorithms that consume a small amount of memory are critical. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithms performance on synthetic and real data streams.

international world wide web conferences | 2009

Releasing search queries and clicks privately

Aleksandra Korolova; Krishnaram Kenthapadi; Nina Mishra; Alexandros Ntoulas

The question of how to publish an anonymized search log was brought to the forefront by a well-intentioned, but privacy-unaware AOL search log release. Since then a series of ad-hoc techniques have been proposed in the literature, though none are known to be provably private. In this paper, we take a major step towards a solution: we show how queries, clicks and their associated perturbed counts can be published in a manner that rigorously preserves privacy. Our algorithm is decidedly simple to state, but non-trivial to analyze. On the opposite side of privacy is the question of whether the data we can safely publish is of any use. Our findings offer a glimmer of hope: we demonstrate that a non-negligible fraction of queries and clicks can indeed be safely published via a collection of experiments on a real search log. In addition, we select an application, keyword generation, and show that the keyword suggestions generated from the perturbed data resemble those generated from the original data.

theory and application of cryptographic techniques | 2004

Secure Computation of the kth-Ranked Element

Gagan Aggarwal; Nina Mishra; Benny Pinkas

Given two or more parties possessing large, confidential datasets, we consider the problem of securely computing the k th -ranked element of the union of the datasets, e.g. the median of the values in the datasets. We investigate protocols with sublinear computation and communication costs. In the two-party case, we show that the k th -ranked element can be computed in log k rounds, where the computation and communication costs of each round are O(log M), where log M is the number of bits needed to describe each element of the input data. The protocol can be made secure against a malicious adversary, and can hide the sizes of the original datasets. In the multi-party setting, we show that the k th -ranked element can be computed in log M rounds, with O(s log M) overhead per round, where s is the number of parties. The multi-party protocol can be used in the two-party case and can also be made secure against a malicious adversary.

workshop on algorithms and models for the web graph | 2007

Clustering social networks

Nina Mishra; Robert Schreiber; Isabelle Stanton; Robert Endre Tarjan

Social networks are ubiquitous. The discovery of close-knit clusters in these networks is of fundamental and practical interest. Existing clustering criteria are limited in that clusters typically do not overlap, all vertices are clustered and/or external sparsity is ignored. We introduce a new criterion that overcomes these limitations by combining internal density with external sparsity in a natural way. An algorithm is given for provably finding the clusters, provided there is a sufficiently large gap between internal density and external sparsity. Experiments on real social networks illustrate the effectiveness of the algorithm.

international cryptology conference | 2006

When random sampling preserves privacy

Kamalika Chaudhuri; Nina Mishra

Many organizations such as the U.S. Census publicly release samples of data that they collect about private citizens. These datasets are first anonymized using various techniques and then a small sample is released so as to enable “do-it-yourself” calculations. This paper investigates the privacy of the second step of this process: sampling. We observe that rare values – values that occur with low frequency in the table – can be problematic from a privacy perspective. To our knowledge, this is the first work that quantitatively examines the relationship between the number of rare values in a table and the privacy in a released random sample. If we require e-privacy (where the larger e is, the worse the privacy guarantee) with probability at least 1 – δ, we say that a value is rare if it occurs in at most

very large data bases | 2004

Vision paper: enabling privacy for the paranoids

Gagan Aggarwal; Mayank Bawa; Prasanna Ganesan; Hector Garcia-Molina; Krishnaram Kenthapadi; Nina Mishra; Rajeev Motwani; Utkarsh Srivastava; Dilys Thomas; Jennifer Widom; Ying Xu

\tilde{O}(\frac{1}{\epsilon})

Machine Learning | 2004

A New Conceptual Clustering Framework

Nina Mishra; Dana Ron; Ram Swaminathan

rows of the table (ignoring log factors). If there are no rare values, then we establish a direct connection between sample size that is safe to release and privacy. Specifically, if we select each row of the table with probability at most e then the sample is O(e)-private with high probability. In the case that there are t rare values, then the sample is

symposium on principles of database systems | 2006