Kamalika Chaudhuri
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kamalika Chaudhuri.
international conference on machine learning | 2009
Kamalika Chaudhuri; Sham M. Kakade; Karen Livescu; Karthik Sridharan
Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA). Under the assumption that the views are un-correlated given the cluster label, we show that the separation conditions required for the algorithm to be successful are significantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure).
principles of distributed computing | 2004
Byung-Gon Chun; Kamalika Chaudhuri; Hoeteck Wee; Marco Barreno; Christos H. Papadimitriou; John Kubiatowicz
We analyze replication of resources by server nodes that act selfishly, using a game-theoretic approach. We refer to this as the selfish caching problem. In our model, nodes incur either cost for replicating resources or cost for access to a remote replica. We show the existence of pure strategy Nash equilibria and investigate the price of anarchy, which is the relative cost of the lack of coordination. The price of anarchy can be high due to undersupply problems, but with certain network topologies it has better bounds. With a payment scheme the game can always implement the social optimum in the best case by giving servers incentive to replicate.
foundations of computer science | 2003
Kamalika Chaudhuri; Brighten Godfrey; Satish Rao; Kunal Talwar
We give improved approximation algorithms for a variety of latency minimization problems. In particular, we give a 3.59-approximation to the minimum latency problem, improving on previous algorithms by a multiplicative factor of 2. Our techniques also give similar improvements for related problems like k-traveling repairmen and its multiple depot variant. We also observe that standard techniques can be used to speed up the previous and this algorithm by a factor of O/sup /spl tilde//(n).
Journal of the American Medical Informatics Association | 2012
Lucila Ohno-Machado; Vineet Bafna; Aziz A. Boxwala; Brian E. Chapman; Wendy W. Chapman; Kamalika Chaudhuri; Michele E. Day; Claudiu Farcas; Nathaniel D. Heintzman; Xiaoqian Jiang; Hyeoneui Kim; Jihoon Kim; Michael E. Matheny; Frederic S. Resnic; Staal A. Vinterbo
iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.
international cryptology conference | 2006
Kamalika Chaudhuri; Nina Mishra
Many organizations such as the U.S. Census publicly release samples of data that they collect about private citizens. These datasets are first anonymized using various techniques and then a small sample is released so as to enable “do-it-yourself” calculations. This paper investigates the privacy of the second step of this process: sampling. We observe that rare values – values that occur with low frequency in the table – can be problematic from a privacy perspective. To our knowledge, this is the first work that quantitatively examines the relationship between the number of rare values in a table and the privacy in a released random sample. If we require e-privacy (where the larger e is, the worse the privacy guarantee) with probability at least 1 – δ, we say that a value is rare if it occurs in at most
ieee global conference on signal and information processing | 2013
Shuang Song; Kamalika Chaudhuri; Anand D. Sarwate
\tilde{O}(\frac{1}{\epsilon})
IEEE Signal Processing Magazine | 2013
Anand D. Sarwate; Kamalika Chaudhuri
rows of the table (ignoring log factors). If there are no rare values, then we establish a direct connection between sample size that is safe to release and privacy. Specifically, if we select each row of the table with probability at most e then the sample is O(e)-private with high probability. In the case that there are t rare values, then the sample is
symposium on discrete algorithms | 2006
Kamalika Chaudhuri; Kevin C. Chen; Radu Mihaescu; Satish Rao
\tilde{O}(\epsilon \delta /t)
international workshop and international workshop on approximation randomization and combinatorial optimization algorithms and techniques | 2005
Kamalika Chaudhuri; Satish Rao; Samantha Riesenfeld; Kunal Talwar
-private with probability at least 1–δ.
acm symposium on parallel algorithms and architectures | 2005
Eric Anderson; Dirk Beyer; Kamalika Chaudhuri; Terence Kelly; Norman Salazar; Cipriano A. Santos; Ram Swaminathan; Robert Endre Tarjan; Janet L. Wiener; Yunhong Zhou
Differential privacy is a recent framework for computation on sensitive data, which has shown considerable promise in the regime of large datasets. Stochastic gradient methods are a popular approach for learning in the data-rich regime because they are computationally tractable and scalable. In this paper, we derive differentially private versions of stochastic gradient descent, and test them empirically. Our results show that standard SGD experiences high variability due to differential privacy, but a moderate increase in the batch size can improve performance significantly.