Jonathan Ullman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan Ullman is active.

Explore More

Publication

Featured researches published by Jonathan Ullman.

theory of cryptography conference | 2012

Iterative constructions and private data release

Anupam Gupta; Aaron Roth; Jonathan Ullman

In this paper we study the problem of approximately releasing the cut function of a graph while preserving differential privacy, and give new algorithms (and new analyses of existing algorithms) in both the interactive and non-interactive settings. Our algorithms in the interactive setting are achieved by revisiting the problem of releasing differentially private, approximate answers to a large number of queries on a database. We show that several algorithms for this problem fall into the same basic framework, and are based on the existence of objects which we call iterative database construction algorithms. We give a new generic framework in which new (efficient) IDC algorithms give rise to new (efficient) interactive private query release mechanisms. Our modular analysis simplifies and tightens the analysis of previous algorithms, leading to improved bounds. We then give a new IDC algorithm (and therefore a new private, interactive query release mechanism) based on the Frieze/Kannan low-rank matrix decomposition. This new release mechanism gives an improvement on prior work in a range of parameters where the size of the database is comparable to the size of the data universe (such as releasing all cut queries on dense graphs). We also give a non-interactive algorithm for efficiently releasing private synthetic data for graph cuts with error O(|V|1.5). Our algorithm is based on randomized response and a non-private implementation of the SDP-based, constant-factor approximation algorithm for cut-norm due to Alon and Naor. Finally, we give a reduction based on the IDC framework showing that an efficient, private algorithm for computing sufficiently accurate rank-1 matrix approximations would lead to an improved efficient algorithm for releasing private synthetic data for graph cuts. We leave finding such an algorithm as our main open problem.

symposium on principles of database systems | 2015

Private Multiplicative Weights Beyond Linear Queries

Jonathan Ullman

A wide variety of fundamental data analyses in machine learning, such as linear and logistic regression, require minimizing a convex function defined by the data. Since the data may contain sensitive information about individuals, and these analyses can leak that sensitive information, it is important to be able to solve convex minimization in a privacy-preserving way. A series of recent results show how to accurately solve a single convex minimization problem in a differentially private manner. However, the same data is often analyzed repeatedly, and little is known about solving multiple convex minimization problems with differential privacy. For simpler data analyses, such as linear queries, there are remarkable differentially private algorithms such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS 2010) that accurately answer exponentially many distinct queries. In this work, we extend these results to the case of convex minimization and show how to give accurate and differentially private solutions to exponentially many convex minimization problems on a sensitive dataset.

foundations of computer science | 2015

Robust Traceability from Trace Amounts

Cynthia Dwork; Adam D. Smith; Thomas Steinke; Jonathan Ullman; Salil P. Vadhan

The privacy risks inherent in the release of a large number of summary statistics were illustrated by Homer et al. (PLoS Genetics, 2008), who considered the case of 1-way marginals of SNP allele frequencies obtained in a genome-wide association study: Given a large number of minor allele frequencies from a case group of individuals diagnosed with a particular disease, together with the genomic data of a single target individual and statistics from a sizable reference dataset independently drawn from the same population, an attacker can determine with high confidence whether or not the target is in the case group. In this work we describe and analyze a simple attack that succeeds even if the summary statistics are significantly distorted, whether due to measurement error or noise intentionally introduced to protect privacy. Our attack only requires that the vector of distorted summary statistics is close to the vector of true marginals in ℓ1 norm. Moreover, the reference pool required by previous attacks can be replaced by a single sample drawn from the underlying population. The new attack, which is not specific to genomics and which handles Gaussian as well as Bernouilli data, significantly generalizes recent lower bounds on the noise needed to ensure differential privacy (Bun, Ullman, and Vadhan, STOC 2014, Steinke and Ullman, 2015), obviating the need for the attacker to control the exact distribution of the data.

international colloquium on automata, languages and programming | 2014

Privately Solving Linear Programs

Justin Hsu; Aaron Roth; Tim Roughgarden; Jonathan Ullman

In this paper, we initiate the systematic study of solving linear programs under differential privacy. The first step is simply to define the problem: to this end, we introduce several natural classes of private linear programs that capture different ways sensitive data can be incorporated into a linear program. For each class of linear programs we give an efficient, differentially private solver based on the multiplicative weights framework, or we give an impossibility result.

information theory and applications | 2016

Interactive fingerprinting codes and the hardness of preventing false discovery

Thomas Steinke; Jonathan Ullman

We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given n samples from an unknown distribution. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is accurate if it is “close” to the correct expectation over the distribution. This question was recently studied by Dwork et al. [DFH+ 15], who showed how to answer Ω(n2) queries efficiently, and also by Hardt and Ullman [HU14], who showed that answering Õ(n3) queries is hard. We close the gap between the two bounds and show that, under a standard hardness assumption, there is no computationally efficient algorithm that, given n samples from an unknown distribution, can give valid answers to O(n2) adaptively chosen statistical queries. An implication of our results is that computationally efficient algorithms for answering arbitrary, adaptively chosen statistical queries may as well be differentially private. We obtain our results using a new connection between the problem of answering adaptively chosen statistical queries and a combinatorial object called an interactive fingerprinting code [FT01]. In order to optimize our hardness result, we give a new Fourier-analytic approach to analyzing fingerprinting codes that is simpler, more flexible, and yields better parameters than previous constructions.

symposium on discrete algorithms | 2017

Make up your mind: the price of online queries in differential privacy

Mark Bun; Thomas Steinke; Jonathan Ullman

We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query. Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries. Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line.

symposium on principles of database systems | 2016

Space Lower Bounds for Itemset Frequency Sketches

Edo Liberty; Michael Mitzenmacher; Justin Thaler; Jonathan Ullman

Given a database, computing the fraction of rows that contain a query itemset or determining whether this fraction is above some threshold are fundamental operations in data mining. A uniform sample of rows is a good sketch of the database in the sense that all sufficiently frequent itemsets and their approximate frequencies are recoverable from the sample, and the sketch size is independent of the number of rows in the original database. For many seemingly similar problems there are better sketching algorithms than uniform sampling. In this paper we show that for itemset frequency sketching this is not the case. That is, we prove that there exist classes of databases for which uniform sampling is a space optimal sketch for approximate itemset frequency analysis, up to constant or iterated-logarithmic factors.

information theory and applications | 2011

On the zero-error capacity threshold for deletion channels

Ian A. Kash; Michael Mitzenmacher; Justin Thaler; Jonathan Ullman

We consider the zero-error capacity of deletion channels. Specifically, we consider the setting where we choose a codebook C consisting of strings of n bits, and our model of the channel corresponds to an adversary who may delete up to pn of these bits for a constant p. Our goal is to decode correctly without error regardless of the actions of the adversary. We consider what values of p allow non-zero capacity in this setting. We suggest multiple approaches, one of which makes use of the natural connection between this problem and the problem of finding the expected length of the longest common subsequence of two random sequences.

foundations of computer science | 2017

Tight Lower Bounds for Differentially Private Selection

Thomas Steinke; Jonathan Ullman

A pervasive task in the differential privacy literature is to select the k items of highest quality out of a set of d items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algorithms.The standard approaches to these tasks—repeated use of the exponential mechanism or the sparse vector technique—approximately solve this problem given a dataset of n = O(√{k}\log d) samples. We provide a tight lower bound for some very simple variants of the private selection problem. Our lower bound shows that a sample of size n = Ω(√{k} \log d) is required even to achieve a very minimal accuracy guarantee.Our results are based on an extension of the fingerprinting method to sparse selection problems. Previously, the fingerprinting method has been used to provide tight lower bounds for answering an entire set of d queries, but often only some much smaller set of k queries are relevant. Our extension allows us to prove lower bounds that depend on both the number of relevant queries and the total number of queries.

economics and computation | 2015

Inducing Approximately Optimal Flow Using Truthful Mediators

Ryan M. Rogers; Aaron Roth; Jonathan Ullman; Zhiwei Steven Wu

We revisit a classic coordination problem from the perspective of mechanism design: how can we coordinate a social welfare maximizing flow in a network congestion game with selfish players? The classical approach, which computes tolls as a function of known demands, fails when the demands are unknown to the mechanism designer, and naively eliciting them does not necessarily yield a truthful mechanism. Instead, we introduce a weak mediator that can provide suggested routes to players and set tolls as a function of reported demands. However, players can choose to ignore or misreport their type to this mediator. Using techniques from differential privacy, we show how to design a weak mediator such that it is an asymptotic ex-post Nash equilibrium for all players to truthfully report their types to the mediator and faithfully follow its suggestion, and that when they do, they end up playing a nearly optimal flow. Notably, our solution works in settings of incomplete information even in the absence of a prior distribution on player types. Along the way, we develop new techniques for privately solving convex programs which may be of independent interest.

Explore More