Aaron Roth | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aaron Roth is active.

Explore More

Publication

Featured researches published by Aaron Roth.

symposium on the theory of computing | 2008

A learning theory approach to non-interactive database privacy

Avrim Blum; Katrina Ligett; Aaron Roth

We demonstrate that, ignoring computational constraints, it is possible to release privacy-preserving databases that are useful for all queries over a discretized domain from any given concept class with polynomial VC-dimension. We show a new lower bound for releasing databases that are useful for halfspace queries over a continuous domain. Despite this, we give a privacy-preserving polynomial time algorithm that releases information useful for all halfspace queries, for a slightly relaxed definition of usefulness. Inspired by learning theory, we introduce a new notion of data privacy, which we call distributional privacy, and show that it is strictly stronger than the prevailing privacy notion, differential privacy.

symposium on the theory of computing | 2010

Interactive privacy via the median mechanism

Aaron Roth; Tim Roughgarden

We define a new interactive differentially private mechanism --- the median mechanism --- for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy mechanism (the Laplace mechanism, which independently perturbs each query result). With respect to the number of queries, our guarantee is close to the best possible, even for non-interactive privacy mechanisms. Conceptually, the median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting. We also give an efficient implementation of the median mechanism, with running time polynomial in the number of queries, the database size, and the domain size. This efficient implementation guarantees privacy for all input databases, and accurate query results for almost all input distributions. The dependence of the privacy on the number of queries in this mechanism improves over that of the best previously known efficient mechanism by a super-polynomial factor, even in the non-interactive setting.

Journal of the ACM | 2013

A learning theory approach to noninteractive database privacy

Avrim Blum; Katrina Ligett; Aaron Roth

In this article, we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VC-dimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worst-case utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axis-aligned rectangles of constant dimension over discrete domains.

theory of cryptography conference | 2012

Iterative constructions and private data release

Anupam Gupta; Aaron Roth; Jonathan Ullman

In this paper we study the problem of approximately releasing the cut function of a graph while preserving differential privacy, and give new algorithms (and new analyses of existing algorithms) in both the interactive and non-interactive settings. Our algorithms in the interactive setting are achieved by revisiting the problem of releasing differentially private, approximate answers to a large number of queries on a database. We show that several algorithms for this problem fall into the same basic framework, and are based on the existence of objects which we call iterative database construction algorithms. We give a new generic framework in which new (efficient) IDC algorithms give rise to new (efficient) interactive private query release mechanisms. Our modular analysis simplifies and tightens the analysis of previous algorithms, leading to improved bounds. We then give a new IDC algorithm (and therefore a new private, interactive query release mechanism) based on the Frieze/Kannan low-rank matrix decomposition. This new release mechanism gives an improvement on prior work in a range of parameters where the size of the database is comparable to the size of the data universe (such as releasing all cut queries on dense graphs). We also give a non-interactive algorithm for efficiently releasing private synthetic data for graph cuts with error O(|V|1.5). Our algorithm is based on randomized response and a non-private implementation of the SDP-based, constant-factor approximation algorithm for cut-norm due to Alon and Naor. Finally, we give a reduction based on the IDC framework showing that an efficient, private algorithm for computing sufficiently accurate rank-1 matrix approximations would lead to an improved efficient algorithm for releasing private synthetic data for graph cuts. We leave finding such an algorithm as our main open problem.

Science | 2015

The reusable holdout: Preserving validity in adaptive data analysis

Cynthia Dwork; Vitaly Feldman; Moritz Hardt; Toniann Pitassi; Omer Reingold; Aaron Roth

Testing hypotheses privately Large data sets offer a vast scope for testing already-formulated ideas and exploring new ones. Unfortunately, researchers who attempt to do both on the same data set run the risk of making false discoveries, even when testing and exploration are carried out on distinct subsets of data. Based on ideas drawn from differential privacy, Dwork et al. now provide a theoretical solution. Ideas are tested against aggregate information, whereas individual data set components remain confidential. Preserving that privacy also preserves statistical inference validity. Science, this issue p. 636 A statistical approach allows large data sets to be reanalyzed to test new hypotheses. Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. In common practice, however, data analysis is an intrinsically adaptive process, with new analyses generated on the basis of data exploration, as well as the results of previous analyses on the same data. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis. As an application, we show how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses.

symposium on the theory of computing | 2015

Preserving Statistical Validity in Adaptive Data Analysis

Cynthia Dwork; Vitaly Feldman; Moritz Hardt; Toniann Pitassi; Omer Reingold; Aaron Roth

A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses. In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of m adaptively chosen functions on an unknown distribution given n random samples. We show that, surprisingly, there is a way to estimate an exponential in n number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question.

workshop on internet and network economics | 2010

Constrained non-monotone submodular maximization: offline and secretary algorithms

Anupam Gupta; Aaron Roth; Grant Schoenebeck; Kunal Talwar

Constrained submodular maximization problems have long been studied, most recently in the context of auctions and computational advertising, with near-optimal results known under a variety of constraints when the submodular function is monotone. In this paper, we give constant approximation algorithms for the non-monotone case that work for p-independence systems (which generalize constraints given by the intersection of p matroids that had been studied previously), where the running time is poly(n, p). Our algorithms and analyses are simple, and essentially reduce non-monotone maximization to multiple runs of the greedy algorithm previously used in the monotone case. We extend these ideas to give a simple greedy-based constant factor algorithms for non-monotone submodular maximization subject to a knapsack constraint, and for (online) secretary setting (where elements arrive one at a time in random order and the algorithm must make irrevocable decisions) subject to uniform matroid or a partition matroid constraint. Finally, we give an O(log k) approximation in the secretary setting subject to a general matroid constraint of rank k.

ieee computer security foundations symposium | 2014

Differential Privacy: An Economic Method for Choosing Epsilon

Justin Hsu; Marco Gaboardi; Andreas Haeberlen; Sanjeev Khanna; Arjun Narayan; Benjamin C. Pierce; Aaron Roth

Differential privacy is becoming a gold standard notion of privacy; it offers a guaranteed bound on loss of privacy due to release of query results, even under worst-case assumptions. The theory of differential privacy is an active research area, and there are now differentially private algorithms for a wide range of problems. However, the question of when differential privacy works in practice has received relatively little attention. In particular, there is still no rigorous method for choosing the key parameter ε, which controls the crucial tradeoff between the strength of the privacy guarantee and the accuracy of the published results. In this paper, we examine the role of these parameters in concrete applications, identifying the key considerations that must be addressed when choosing specific values. This choice requires balancing the interests of two parties with conflicting objectives: the data analyst, who wishes to learn something abou the data, and the prospective participant, who must decide whether to allow their data to be included in the analysis. We propose a simple model that expresses this balance as formulas over a handful of parameters, and we use our model to choose ε on a series of simple statistical studies. We also explore a surprising insight: in some circumstances, a differentially private study can be more accurate than a non-private study for the same cost, under our model. Finally, we discuss the simplifying assumptions in our model and outline a research agenda for possible refinements.

symposium on the theory of computing | 2012

Beating randomized response on incoherent matrices

Moritz Hardt; Aaron Roth

Computing accurate low rank approximations of large matrices is a fundamental data mining task. In many applications however the matrix contains sensitive information about individuals. In such case we would like to release a low rank approximation that satisfies a strong privacy guarantee such as differential privacy. Unfortunately, to date the best known algorithm for this task that satisfies differential privacy is based on naive input perturbation or randomized response: Each entry of the matrix is perturbed independently by a sufficiently large random noise variable, a low rank approximation is then computed on the resulting matrix. We give (the first) significant improvements in accuracy over randomized response under the natural and necessary assumption that the matrix has low coherence. Our algorithm is also very efficient and finds a constant rank approximation of an m x n matrix in time O(mn). Note that even generating the noise matrix required for randomized response already requires time O(mn).

symposium on the theory of computing | 2013

Beyond worst-case analysis in private singular vector computation

Moritz Hardt; Aaron Roth

We consider differentially private approximate singular vector computation. Known worst-case lower bounds show that the error of any differentially private algorithm must scale polynomially with the dimension of the singular vector. We are able to replace this dependence on the dimension by a natural parameter known as the coherence of the matrix that is often observed to be significantly smaller than the dimension both theoretically and empirically. We also prove a matching lower bound showing that our guarantee is nearly optimal for every setting of the coherence parameter. Notably, we achieve our bounds by giving a robust analysis of the well-known power iteration algorithm, which may be of independent interest. Our algorithm also leads to improvements in worst-case settings and to better low-rank approximations in the spectral norm.

Explore More