Thomas Steinke | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas Steinke is active.

Explore More

Publication

Featured researches published by Thomas Steinke.

foundations of computer science | 2015

Robust Traceability from Trace Amounts

Cynthia Dwork; Adam D. Smith; Thomas Steinke; Jonathan Ullman; Salil P. Vadhan

The privacy risks inherent in the release of a large number of summary statistics were illustrated by Homer et al. (PLoS Genetics, 2008), who considered the case of 1-way marginals of SNP allele frequencies obtained in a genome-wide association study: Given a large number of minor allele frequencies from a case group of individuals diagnosed with a particular disease, together with the genomic data of a single target individual and statistics from a sizable reference dataset independently drawn from the same population, an attacker can determine with high confidence whether or not the target is in the case group. In this work we describe and analyze a simple attack that succeeds even if the summary statistics are significantly distorted, whether due to measurement error or noise intentionally introduced to protect privacy. Our attack only requires that the vector of distorted summary statistics is close to the vector of true marginals in ℓ1 norm. Moreover, the reference pool required by previous attacks can be replaced by a single sample drawn from the underlying population. The new attack, which is not specific to genomics and which handles Gaussian as well as Bernouilli data, significantly generalizes recent lower bounds on the noise needed to ensure differential privacy (Bun, Ullman, and Vadhan, STOC 2014, Steinke and Ullman, 2015), obviating the need for the attacker to control the exact distribution of the data.

international workshop and international workshop on approximation, randomization, and combinatorial optimization. algorithms and techniques | 2013

Pseudorandomness for Regular Branching Programs via Fourier Analysis

Omer Reingold; Thomas Steinke; Salil P. Vadhan

We present an explicit pseudorandom generator for oblivious, read-once, permutation branching programs of constant width that can read their input bits in any order. The seed length is O(log2 n), where n is the length of the branching program. The previous best seed length known for this model was n 1/2 + o(1), which follows as a special case of a generator due to Impagliazzo, Meka, and Zuckerman (FOCS 2012) (which gives a seed length of s 1/2 + o(1) for arbitrary branching programs of size s). Our techniques also give seed length n 1/2 + o(1) for general oblivious, read-once branching programs of width \(2^{n^{o(1)}}\), which is incomparable to the results of Impagliazzo et al.

conference on innovations in theoretical computer science | 2012

Learning hurdles for sleeping experts

Varun Kanade; Thomas Steinke

We study the online decision problem where the set of available actions varies over time, also called the sleeping experts problem. We consider the setting where the performance comparison is made with respect to the best ordering of actions in hindsight. In this paper, both the payoff function and the availability of actions is adversarial. Kleinberg et al. (2008) gave a computationally efficient no-regret algorithm in the setting where payoffs are stochastic. Kanade et al. (2009) gave an efficient no-regret algorithm in the setting where action availability is stochastic. However, the question of whether there exists a computationally efficient no-regret algorithm in the adversarial setting was posed as an open problem by Kleinberg et al. (2008). We show that such an algorithm would imply an algorithm for PAC learning DNF, a long standing important open problem. We also show that a related problem, the gambling problem, posed as an open problem by Abernethy (2010) is related to agnostically learning halfspaces, albeit under restricted distributions.

information theory and applications | 2016

Interactive fingerprinting codes and the hardness of preventing false discovery

Thomas Steinke; Jonathan Ullman

We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given n samples from an unknown distribution. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is accurate if it is “close” to the correct expectation over the distribution. This question was recently studied by Dwork et al. [DFH+ 15], who showed how to answer Ω(n2) queries efficiently, and also by Hardt and Ullman [HU14], who showed that answering Õ(n3) queries is hard. We close the gap between the two bounds and show that, under a standard hardness assumption, there is no computationally efficient algorithm that, given n samples from an unknown distribution, can give valid answers to O(n2) adaptively chosen statistical queries. An implication of our results is that computationally efficient algorithms for answering arbitrary, adaptively chosen statistical queries may as well be differentially private. We obtain our results using a new connection between the problem of answering adaptively chosen statistical queries and a combinatorial object called an interactive fingerprinting code [FT01]. In order to optimize our hardness result, we give a new Fourier-analytic approach to analyzing fingerprinting codes that is simpler, more flexible, and yields better parameters than previous constructions.

symposium on discrete algorithms | 2017

Make up your mind: the price of online queries in differential privacy

Mark Bun; Thomas Steinke; Jonathan Ullman

We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query. Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries. Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line.

foundations of computer science | 2017

Tight Lower Bounds for Differentially Private Selection

Thomas Steinke; Jonathan Ullman

A pervasive task in the differential privacy literature is to select the k items of highest quality out of a set of d items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algorithms.The standard approaches to these tasks—repeated use of the exponential mechanism or the sparse vector technique—approximately solve this problem given a dataset of n = O(√{k}\log d) samples. We provide a tight lower bound for some very simple variants of the private selection problem. Our lower bound shows that a sample of size n = Ω(√{k} \log d) is required even to achieve a very minimal accuracy guarantee.Our results are based on an extension of the fingerprinting method to sparse selection problems. Previously, the fingerprinting method has been used to provide tight lower bounds for answering an entire set of d queries, but often only some much smaller set of k queries are relevant. Our extension allows us to prove lower bounds that depend on both the number of relevant queries and the total number of queries.

international workshop and international workshop on approximation, randomization, and combinatorial optimization. algorithms and techniques | 2015

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness.

Mark Bun; Thomas Steinke

Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on

symposium on the theory of computing | 2018

Composable and versatile privacy via truncated CDP

Mark Bun; Cynthia Dwork; Guy N. Rothblum; Thomas Steinke

\mathbb{R}^n

innovations in theoretical computer science | 2014

Learning Hurdles for Sleeping Experts

Varun Kanade; Thomas Steinke

in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to

symposium on the theory of computing | 2016