[PDF] Order-Revealing Encryption and the Hardness of Private Learning

Abstract

An order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient (ϵ,δ) -differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS '08, SIAM J. Comput. '11). To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest.

Full PDF

OOrder-Revealing Encryption and theHardness of Private Learning

Mark Bun ∗ Mark Zhandry † May 2, 2015

Abstract

An order-revealing encryption scheme gives a public procedure by which two ciphertextscan be compared to reveal the ordering of their underlying plaintexts. We show how to useorder-revealing encryption to separate computationally eﬃcient PAC learning from eﬃcient( ε, δ )-diﬀerentially private PAC learning. That is, we construct a concept class that is eﬃcientlyPAC learnable, but for which every eﬃcient learner fails to be diﬀerentially private. This answersa question of Kasiviswanathan et al. (FOCS ’08, SIAM J. Comput. ’11).To prove our result, we give a generic transformation from an order-revealing encryptionscheme into one with strongly correct comparison, which enables the consistent comparisonof ciphertexts that are not obtained as the valid encryption of any message. We believe thisconstruction may be of independent interest.

Keywords : diﬀerential privacy, learning theory, order-revealing encryption ∗ School of Engineering & Applied Sciences, Harvard University. [email protected] . Supported by anNDSEG fellowship and NSF grant CNS-1237235. † Stanford University. [email protected] . Supported by the DARPA PROCEED program. a r X i v : . [ c s . CR ] M a y Introduction

Many agencies hold sensitive information about individuals, where statistical analysis of this datacould yield great societal beneﬁt. The line of work on diﬀerential privacy [DMNS06] aims to enablesuch analysis while giving a strong formal guarantee on the privacy aﬀorded to individuals. Notingthat the framework of computational learning theory captures many of these statistical tasks,Kasiviswanathan et al. [KLN +

11] initiated the study of diﬀerentially private learning . Roughlyspeaking, a diﬀerentially private learner is required to output a classiﬁcation of labeled examplesthat is accurate, but does not change signiﬁcantly based on the presence or absence of any individualexample.The early positive results in private learning established that, ignoring computational complex-ity, any concept class is privately learnable with a number of samples logarithmic in the size of theconcept class [KLN + computational priceof diﬀerential privacy for learning tasks, for which much less is known. The initial work of Ka-siviswanathan et al. [KLN +

11] identiﬁed the important question of whether any eﬃciently PAClearnable concept class is also eﬃciently privately learnable, but only limited progress has beenmade on this question since then [BKN10, Nis14].Our main result gives a strong negative answer to this question. We exhibit a concept classthat is eﬃciently PAC learnable, but under plausible cryptographic assumptions cannot be learnedeﬃciently and privately. To prove this result, we establish a connection between private learningand order-revealing encryption . We construct a new order-revealing encryption scheme with strongcorrectness properties that may be of independent learning-theoretic and cryptographic interest.

We ﬁrst recall Valiant’s (distribution-free) PAC model for learning [Val84]. Let C be a conceptclass consisting of concepts c : X → { , } for a data universe X . A learner L is given n samplesof the form ( x i , c ( x i )) where the x i ’s are drawn i.i.d. from an unknown distribution, and arelabeled according to an unknown concept c . The goal of the learner is to output a hypothesis h : X → { , } from a hypothesis class H that approximates c well on the unknown distribution.That is, the probability that h disagrees with c on a fresh example from the unknown distributionshould be small – say, less than 0 .

05. The hypothesis class H may be diﬀerent from C , but in thecase where H ⊆ C we call L a proper learner. Moreover, we say a learner is eﬃcient if it runs intime polynomial in the description size of c and the size of its examples.Kasiviswanathan et al. [KLN +

11] deﬁned a private learner to be a PAC learner that is alsodiﬀerentially private. Two samples S = { ( x , b ) , . . . , ( x n , b n ) } and S (cid:48) = { ( x (cid:48) , b (cid:48) ) , . . . , ( x (cid:48) n , b (cid:48) n ) } aresaid to be neighboring if they diﬀer on exactly one example, which we think of as corresponding toone individual’s information. A randomized learner L : ( X × { , } ) n → H is ( ε, δ )- diﬀerentiallyprivate if for all neighboring datasets S and S (cid:48) and all sets T ⊆ H ,Pr[ L ( S ) ∈ T ] ≤ e ε Pr[ L ( S (cid:48) ) ∈ T ] + δ. δ = 0, a case which is called pure diﬀerential privacy. The deﬁnition with positive δ , called approximate diﬀerential privacy, ﬁrstappeared in [DKM +

06] and has since been shown to enable substantial accuracy gains. Throughoutthis introduction, we will think of ε as a small constant, e.g. ε = 0 .

1, and δ = o (1 /n ).Kasiviswanathan et al. [KLN +

11] gave a generic “Private Occam’s Razor” algorithm, showingthat any concept class C can be privately (properly) learned using O (log |C| ) samples. Unfortu-nately, this algorithm runs in time Ω( |C| ), which is exponential in the description size of eachconcept. With an eye toward designing eﬃcient private learners, Blum et al. [BDMN05] made thepowerful observation that any eﬃcient learning algorithm in the statistical queries (SQ) frameworkof Kearns [Kea98] can be eﬃciently simulated with diﬀerential privacy. Moreover, Kasiviswanathanet al. [KLN +

11] showed that the eﬃcient learner for the concept class of parity functions basedon Gaussian elimination can also be implemented eﬃciently with diﬀerential privacy. These twotechniques – SQ learning and Gaussian elimination – are essentially the only methods known forcomputationally eﬃcient PAC learning. The fact that these can both be implemented privately ledKasiviswanathan et al. [KLN +

11] to ask whether all eﬃciently learnable concept classes could alsobe eﬃciently learned with diﬀerential privacy.Beimel et al. [BKN10] made partial progress toward this question in the special case of purediﬀerential privacy with proper learning, showing that the sample complexity of eﬃcient learnerscan be much higher than that of ineﬃcient ones. Speciﬁcally, they showed that assuming theexistence of pseudorandom generators with exponential stretch, there exists for any (cid:96) ( d ) = ω (log d )a concept class over { , } d for which every eﬃcient proper private learner requires Ω( d ) samples, butan ineﬃcient proper private learner only requires O ( (cid:96) ( d )) examples. Nissim [Nis14] strengthenedthis result substantially for “representation learning,” where a proper learner is further restrictedto output a canonical representation of its hypothesis. He showed that, assuming the existence ofone-way functions, there exists a concept class that is eﬃciently representation learnable, but noteﬃciently privately representation learnable (even with approximate diﬀerential privacy). WithNissim’s kind permission, we give the details of this construction in Section 5.Despite these negative results for proper learning, one might still have hoped that any eﬃcientlylearnable concept class could be eﬃciently improperly learned with privacy. Indeed, a number ofworks have shown that, especially with diﬀerential privacy, improper learning can be much morepowerful than proper learning. For instance, Beimel et al. [BKN10] showed that under purediﬀerential privacy, the simple class of Point functions (indicators of a single domain element)requires Ω( d ) samples to privately learn properly, but only O (log d ) samples to privately learnimproperly. Moreover, computational separations are known between proper and improper learningeven without privacy considerations. Pitt and Valiant [PV88] showed that unless NP = RP , k -termDNF are not eﬃciently properly learnable, but they are eﬃciently improperly learnable [Val84].Under plausible cryptographic assumptions, we resolve the question of Kasiviswanathan et al.[KLN +

11] in the negative, even for improper learners. The assumption we need is the existence of“strongly correct” order-revealing encryption (ORE) schemes, described in Section 1.3.

Theorem 1.1 (Informal) . Assuming the existence of strongly correct ORE, there exists an ef-ﬁciently computable concept class

EncThresh that is eﬃciently PAC learnable, but not eﬃcientlylearnable by any ( ε, δ ) -diﬀerentially private algorithm. We stress that this result holds even for improper learners and for the relaxed notion of ap-proximate diﬀerential privacy. We remark that cryptography has played a major role in shap-ing our understanding of the computational complexity of learning in a number of models (e.g.2Val84, KV94, Kha95, Ser00]). It has also been used before to show separations between what iseﬃciently learnable in diﬀerent models (e.g. [Blu94, SG04]).

We give an informal overview of the construction and analysis of the concept class

EncThresh .We ﬁrst describe the concept class of thresholds

Thresh and its simple PAC learning algorithm.Consider the domain [ N ] = { , . . . , N } . Given a number t ∈ [ N ], a threshold concept c t is deﬁnedby c t ( x ) = 1 if and only if x ≤ t . The concept class of thresholds admits a simple and eﬃcientproper PAC learning algorithm L Thresh . Given a sample { ( x , c t ( x )) , . . . , ( x n , c t ( x n )) } labeled byan unknown concept c t , the learner L Thresh identiﬁes the largest positive example x i ∗ and outputsthe hypothesis h = c x i ∗ . That is, L Thresh chooses the threshold concept that minimizes the empiricalerror on its sample. To achieve a small constant error on any underlying distribution on examples,it suﬃces to take n = O (1) samples.A simple but important observation about L Thresh is that it is completely oblivious to the actualnumeric values of its examples, or even to the fact that the domain is [ N ]. In fact, L Thresh worksequally well on any totally-ordered domain on which it can eﬃciently compare examples. In anextreme case, the learner L Thresh still works when its examples are encrypted under an order-revealing encryption (ORE) scheme, which guarantees that L Thresh is able to learn the order ofits examples, but nothing else about them. Up to small technical modiﬁcations, our concept class

EncThresh is exactly the class

Thresh where examples are encrypted under an ORE scheme.For

EncThresh to be eﬃciently PAC learnable, it must be learnable even under distributions thatplace arbitrary weight on examples corresponding to invalid ciphertexts. To this end, we require a“strong correctness” condition on our ORE scheme. The strong correctness condition ensures thatall ciphertexts, even those that are not obtained as encryptions of messages, can be compared ina consistent fashion. This condition is not met by current constructions of ORE, and one of thetechnical contributions of this work is a generic transformation from weakly correct ORE schemesto strongly correct ones.While a learner similar to L Thresh is able to eﬃciently PAC learn the concept class

EncThresh ,we argue that it cannot do so while preserving diﬀerential privacy with respect to its examples.Intuitively, the security of the ORE scheme ensures that essentially the only thing a learner for

EncThresh can do is output a hypothesis that compares an example to one it already has. We makethis intuition precise by giving an algorithm that traces the hypothesis output by any eﬃcientlearner back to one of the examples used to produce it. This formalization builds conceptuallyon the connection between diﬀerential privacy and traitor-tracing schemes (see Section 1.4), butrequires new ideas to adapt to the PAC learning model.

Motivated by the task of answering range queries on encrypted databases, an order-revealing en-cryption (ORE) scheme [BCO11, BLR +

15] is a special type of symmetric key encryption schemewhere it is possible to publicly sort ciphertexts according to the order of the plaintexts . More pre-cisely, the plaintext space of the scheme is the set of integers [ N ] = { , ..., N } , and in addition tothe private encryption and decryption procedures Enc , Dec , there is a public comparison procedure

Comp that takes as input two ciphertexts, and reveals the order of the corresponding plaintexts. More generally, any totally-ordered plaintext space can be considered best-possible semantic security , deﬁned in Boneh et al. [BLR + except for the ordering. Known constructions of order-revealing encryption.

Order-revealing encryption can beseen as a special case of 2-input functional encryption . In such a scheme, there are several functions f , ..., f k , and given two ciphertexts c , c encrypting m , m , it is possible to learn f i ( m , m ) for all i ∈ [ k ]. General multi-input functional encryption schemes can be obtained from indistinguishabilityobfuscation [GGG +

14] or multilinear maps [BLR + single-input functional encryption with function privacy, which means that f is kept secret. Such schemescan be build from regular single-input schemes without function privacy by work of Brakerski andSegev [BS15], and such single-input schemes can also be built from obfuscation [GGH + any distribution on ciphertexts, even distributionswhose support includes malformed ciphertexts. Unfortunately, previous constructions only achievea weak form of correctness, which guarantees that encrypting two messages and then comparing theciphertexts using Comp produces the same result (with overwhelming probability) as comparing theplaintexts directly. This requirement only speciﬁes how

Comp works on valid ciphertexts, namelyactual encryptions of messages. Moreover, correctness is only guaranteed for these messages withoverwhelming probability, meaning even some valid ciphertexts may cause

Comp to misbehave.For our learner, this weak form of correctness means, for some distributions that place signiﬁcantweight on bad ciphertexts, the comparison procedure is completely useless, and thus the learnerwill fail for these distributions.We therefore need a stronger correctness guarantee. We need that, for any two ciphertexts , thecomparison procedure is consistent with decrypting the two ciphertexts and comparing the resultingplaintexts. This correctness guarantee is meaningful even for improperly generated ciphertexts.We note that none of the existing constructions of order-revealing encryption outlined abovesatisfy this stronger notion. For the obfuscation-based schemes, ciphertexts consist of obfuscatedprograms. In these schemes, it is easy to describe invalid ciphertexts where the obfuscated programperforms incorrectly, causing the comparison procedure to output the wrong result. In the multi-linear map-based schemes, the underlying instantiation use current “noisy” multilinear maps, suchas [GGH13a]. An invalid ciphertext could, for example, have too much noise, which will cause thecomparison procedure to behave unpredictably.

Obtaining strong correctness.

We ﬁrst argue that, for all existing ORE schemes, the schemecan be modiﬁed so that

Comp is correct for all valid ciphertexts. We then give a generic conversionfrom any ORE scheme with weakly correct comparison, including the tweaked existing schemes,into a strongly correct scheme. We simply modify the ciphertext by adding a non-interactivezero-knowledge (NIZK) proof that the ciphertext is well-formed, with the common reference stringadded to the public comparison key. Then the decryption and comparison procedures check theproof(s), and only output the result (either decryption or comparison) if the proof(s) are valid.The (computational) zero-knowledge property of the NIZK implies that the addition of the proofto the ciphertext does not aﬀect security. Meanwhile, NIZK soundness implies that any ciphertext4ccepted by the decryption and comparison procedures must be valid, and the weak correctnessproperty of the underlying ORE implies that for valid ciphertexts, decryption and comparison areconsistent. The result is that comparisons are consistent with decryption for all ciphertexts, givingstrong correctness.As we need strong correctness for every ciphertext, even hard-to-generate ones, we need theNIZK proofs to have perfect soundness, as opposed to computational soundness. Such NIZKproofs were built in [GOS12].We note also that the conversion outlined above is not speciﬁc to ORE, and applies moregenerally to functional encryption schemes.

Hardness of Private Query Release.

One of the most basic and well-studied statistical tasksin diﬀerential privacy is the problem of releasing answers to counting queries . A counting queryasks,“what fraction of the records in a dataset D satisfy the predicate q ?”. Given a collection of k counting queries q , . . . , q k from a family Q , the goal of a query release algorithm is to releaseapproximate answers to these queries while preserving diﬀerential privacy. A remarkable result ofBlum et al. [BLR08], with subsequent improvements by [DNR +

09, DRV10, RR10, HR10, GRU12,HLM12], showed that an arbitrary sequence of counting queries can be answered accurately withdiﬀerential privacy even when k is exponential in the dataset size n . Unfortunately, all of thesealgorithms that are capable of answering more than n queries are ineﬃcient, running in timeexponential in the dimensionality of the data. Moreover, several works [DNR +

09, Ull13, BZ14]have gone on to show that this ineﬃciency is likely inherent.These computational lower bounds for private query release rely on a connection between thehardness of private query release and traitor-tracing schemes , which was ﬁrst observed by Dworket al. [DNR + +

09] exhibited a family of 2 ˜ O ( √ n ) queries for which no eﬃcient algorithm canproduce a data structure which could be used to answer all queries in this family. Very recently,Boneh and Zhandry [BZ14] constructed a new traitor-tracing scheme based on indistinguishabilityobfuscation that yields the same infeasibility result for a family of n · O ( d ) queries on records ofsize d . Extending this connection, Ullman [Ull13] constructed a specialized traitor-tracing schemeto show that no eﬃcient private algorithm can answer more than ˜ O ( n ) arbitrary queries that aregiven as input to the algorithm.Dwork et al. [DNR +

09] also showed strong lower bounds against private algorithms for pro-ducing synthetic data . Synthetic data generation algorithms produce a new “fake” dataset, whoserows are of the same type as those in the original dataset, with the promise that the answers tosome restricted set of queries on the synthetic dataset well-approximate the answers on the originaldataset. Assuming the existence of one-way functions, Dwork et al. [DNR +

09] exhibited an eﬃ-ciently computable collection of queries for which no eﬃcient private algorithm can produce usefulsynthetic data. Ullman and Vadhan [UV11] reﬁned this result to hold even for extremely simpleclasses of queries.Nevertheless, the restriction to synthetic data is signiﬁcant to these results, and they do not rule5ut the possibility that other privacy-preserving data structures can be used to answer large familiesof restricted queries. In fact, when the synthetic data restriction is lifted, there are algorithms (e.g.[HRS12, TUV12, CTUW14, DNT14]) that answer queries from certain exponentially large familiesin subexponential time. One can view the problem of synthetic data generation as analogous toproper learning. In both cases, placing natural syntactic restrictions on the output of an algorithmmay in fact come at the expense of utility or computational eﬃciency.

Eﬃciency of SQ Learning.

Feldman and Kanade [FK12] addressed the question of whetherinformation-theoretically eﬃcient SQ learners – i.e., those making polynomially many queries –could be made computationally eﬃcient. One of their main negative results showed that unless NP = RP , there exists a concept class with polynomial query complexity that is not eﬃcientlySQ learnable. Moreover, this concept class is eﬃciently PAC learnable, which suggests that therestriction to SQ learning can introduce an inherent computational cost.We show that the concept class EncThresh can be learned (ineﬃciently) with polynomiallymany statistical queries. The result of Blum et al. [BDMN05] discussed above, showing thatSQ learning algorithms can be eﬃciently simulated by diﬀerentially private algorithms, thus showsthat

EncThresh also separates SQ learners making polynomially many queries from computationallyeﬃcient SQ learners.

Corollary 1.2 (Informal) . Assuming the existence of strongly correct ORE, the concept class

EncThresh is eﬃciently PAC learnable and has polynomial SQ query complexity, but is not eﬃ-ciently SQ learnable.

While our proof relies on much stronger hardness assumptions, it reveals ORE as a new barrierto eﬃcient SQ learning. As discussed in more detail in Section 3.3, even though their result isabout computational hardness, Feldman and Kanade’s choice of a concept class relies cruciallyon the fact that parities are hard to learn in the SQ model even information-theoretically. Bycontrast, our concept class

EncThresh is computationally hard to SQ learn for a reason that appearsfundamentally diﬀerent than the information-theoretic hardness of SQ learning parities.

Learning from Encrypted Data.

Several works have developed schemes for training, testing,and classifying machine learning models over encrypted data (e.g. [GLN13, BPTG14]). In a modeluse case, a client holds a sensitive dataset, and uploads an encrypted version of the dataset toa cloud computing service. The cloud service then trains a model over the encrypted data andproduces an encrypted classiﬁer it can send back to the client, ideally without learning anythingabout the examples it received. The notion of privacy aﬀorded to the individuals in the dataset hereis complementary to diﬀerential privacy. While the cloud service does not learn anything about theindividuals in the dataset, its output might still depend heavily on the data of certain individuals.In fact, our non-diﬀerentially private PAC learner for the class

EncThresh exactly performs thetask of learning over encrypted data, producing a classiﬁer without learning anything about itsexamples beyond their order (this addresses the diﬃculty of implementing comparisons from priorwork [GLN13]). Thus one can interpret our results as showing that not only are these two notionsof privacy for machine learning training complementary, but that they may actually be in conﬂict.Moreover, the strong correctness guarantee we provide for ORE (which applies more generally tomulti-input functional encryption) may help enable the theoretical study of learning from encrypteddata in other PAC-style settings. 6

Preliminaries and Deﬁnitions

For each k ∈ N , let X k be an instance space (such as { , } k ), where the parameter k representsthe size of the elements in X k . Let C k be a set of boolean functions { c : X k → { , }} . Thesequence ( X , C ) , ( X , C ) , . . . represents an inﬁnite sequence of learning problems deﬁned overinstance spaces of increasing dimension. We will generally suppress the parameter k , and refer tothe problem of learning C as the problem of learning C k for every k .A learner L is given examples sampled from an unknown probability distribution D over X ,where the examples are labeled according to an unknown target concept c ∈ C . The learner mustselect a hypothesis h from a hypothesis class H that approximates the target concept with respectto the distribution D . More precisely, Deﬁnition 2.1.

The generalization error of a hypothesis h : X → { , } (with respect to a targetconcept c and distribution D ) is deﬁned by error D ( c, h ) = Pr x ∼D [ h ( x ) (cid:54) = c ( x )] . If error D ( c, h ) ≤ α we say that h is an α -good hypothesis for c on D . Deﬁnition 2.2 (PAC Learning [Val84]) . Algorithm L : ( X × { , } ) n → H is an ( α, β ) -accuratePAC learner for the concept class C using hypothesis class H with sample complexity n if for all tar-get concepts c ∈ C and all distributions D on X , given an input of n samples S = (( x i , c ( x i )) , . . . , ( x n , c ( x n ))),where each x i is drawn i.i.d. from D , algorithm L outputs a hypothesis h ∈ H satisfying Pr[error D ( c, h ) ≤ α ] ≥ − β . The probability here is taken over the random choice of the examples in S and the cointosses of the learner L .The learner L is eﬃcient if it runs in time polynomial in the size parameter k , the representationsize of the target concept c , and the accuracy parameters 1 /α and 1 /β . Note that a necessary (butnot suﬃcient) condition for L to be eﬃcient is that its sample complexity n is polynomial in thelearning parameters.If H ⊆ C then L is called a proper learner. Otherwise, it is called an improper learner.Kasiviswanathan et al. [KLN +

11] deﬁned a private learner as a PAC learner that is alsodiﬀerentially private. Recall the deﬁnition of diﬀerential privacy:

Deﬁnition 2.3.

A learner L : ( X × { , } ) n → H is ( ε, δ )- diﬀerentially private if for all sets T ⊆ H ,and neighboring sets of examples S ∼ S (cid:48) ,Pr[ L ( S ) ∈ T ] ≤ e ε Pr[ L ( S (cid:48) ) ∈ T ] + δ. The technical object that we will use to show our hardness results for diﬀerential privacy is whatwe call an example reidentiﬁcation scheme . It is analogous to the hard-to-sanitize database distri-butions [DNR +

09, UV11] and re-identiﬁable database distributions [BUV14] used in prior worksto prove hardness results for private query release, but is adapted to the setting of computationallearning. In the ﬁrst step, an algorithm

Gen ex chooses a concept and a sample S labeled accordingto that concept. In the second step, a learner L receives either the sample S or the sample S − i where an appropriately chosen example i is replaced by a junk example, and learns a hypothesis h .Finally, an algorithm Trace ex attempts to use h to identify one of the rows given to L . If Trace ex succeeds at identifying such a row with high probability, then it must be able to distinguish L ( S )from L ( S − i ), showing that L cannot be diﬀerentially private. We formalize these ideas below.7 eﬁnition 2.4. An ( α, ξ )- example reidentiﬁcation scheme for a concept class C consists of a pairof algorithms, ( Gen ex , Trace ex ) with the following properties. Gen ex ( k, n ) Samples a concept c ∈ C k and an associated distribution D . Draws i.i.d. examples x , . . . , x n ← R D , and a ﬁxed value x . Let S denote the labeled sample (( x , c ( x )) , . . . , ( x n , c ( x n )),and for any index i ∈ [ n ], let S − i denote the sample with the pair ( x i , c ( x i )) replaced with( x , c ( x )). Trace ex ( h ) Takes state shared with Gen ex as well as a hypothesis h and identiﬁes an index in [ n ](or ⊥ if none is found).The scheme obeys the following “completeness” and “soundness” criteria on the ability of Trace ex to identify an example given to a learner L . Completeness.

A good hypothesis can be traced to some example. That is, for every eﬃcientlearner L , Pr[error D ( c, h ) ≤ α ∧ Trace ex ( h ) = ⊥ ] ≤ ξ. Here, the probability is taken over h ← R L ( S ) and the coins of Gen ex and Trace ex . Soundness.

For every eﬃcient learner L , Trace ex cannot trace i from the sample S − i . That is,for all i ∈ [ n ], Pr[ Trace ex ( h ) = i ] ≤ ξ for h ← R L ( S − i ).We may sometimes relax the completeness condition to hold only under certain restrictions on L ’s output (e.g. L is a proper learner or L is a representation learner). In this case, we say the( Gen ex , Trace ex ) is an example reidentiﬁcation scheme for (properly, representation) learning a class C . Theorem 2.5.

Let ( Gen ex , Trace ex ) be an ( α, ξ ) -example reidentiﬁcation scheme for a concept class C . Then for every β > and polynomial n ( k ) , there is no eﬃcient ( ε, δ ) -diﬀerentially private ( α, β ) -PAC learner for C using n samples when δ < (cid:18) − β − ξn (cid:19) − e ε ξ. In a typical setting of parameters, we will take α, β, ε = O (1) and δ, ξ = o (1 /n ), in which casethe inequality in Theorem 2.5 will be satisﬁed for suﬃciently large n . Proof.

Suppose instead that there were a computationally eﬃcient ( ε, δ )-diﬀerentially private ( α, β )-PAC learner L for C using n samples. Then there exists an i ∈ [ n ] such that Pr[ Trace ex ( L ( S )) = i ] ≥ (1 − β − ξ ) /n . However, since L is diﬀerentially private,Pr[ Trace ex ( L ( S − i )) = i ] ≥ e − ε (cid:18) − β − ξn − δ (cid:19) > ξ ( n ) , which contradicts the soundness of ( Gen ex , Trace ex ).8 .2 Order-Revealing Encryption Deﬁnition 2.6.

An Order-Revealing Encryption (ORE) scheme is a tuple (

Gen , Enc , Dec , Comp )of algorithms where: • Gen (1 λ , (cid:96) ) is a randomized procedure that takes as inputs a security parameter λ and plain-text length (cid:96) , and outputs a secret encryption/decryption key sk and public parameters params . • Enc ( sk , m ) is a potentially randomized procedure that takes as input a secret key sk and amessage m ∈ { , } (cid:96) , and outputs a ciphertext c . • Dec ( sk , c ) is a deterministic procedure that takes as input a secret key sk and a ciphertext c ,and outputs a plaintext message m ∈ { , } (cid:96) or a special symbol ⊥ . • Comp ( params , c , c ) is a deterministic procedure that “compares” two ciphertexts, outputtingeither “ > ”, “ < ”, “=”, or ⊥ . Correctness.

An ORE scheme must satisfy two separate correctness requirements: • Correct Decryption:

This is the standard notion of correctness for an encryption scheme,which says that decryption succeeds. We will only consider strongly correct decryption, whichrequires that decryption always succeeds. For all security parameters λ and message lengths (cid:96) , Pr[ Dec ( sk , Enc ( sk , m ) ) = m : ( sk , params ) ← Gen (1 λ , (cid:96) )] = 1 . • Correct Comparison:

We require that the comparison function succeeds. We will considertwo notions, namely strong and weak . In order to deﬁne these notions, we ﬁrst deﬁne twoauxiliary functions: – Comp plain ( m , m ) is just the plaintext comparison function. That is, for m < m , Comp plain ( m , m ) = “ < ”, Comp plain ( m , m ) = “ > ”, and Comp plain ( m , m ) = “ =”. – Comp ciph ( sk , c , c ) is a ciphertext comparison function which uses the secret key. If ﬁrstcomputes m b = Dec ( sk , c b ) for b = 0 ,

1. If either m = ⊥ or m = ⊥ (in other words,if either decryption failed), then Comp ciph outputs ⊥ . If both m , m (cid:54) = ⊥ , then theoutput is Comp plain ( m , m ).Now we deﬁne our comparison correctness notions: – Weakly Correct Comparison: This informally requires that comparison is consis-tent with encryption. For all security parameters λ , message lengths (cid:96) , and messages m , m ∈ { , } (cid:96) ,Pr (cid:20) Comp ( params , c , c ) = Comp plain ( m , m ) : ( sk , params ) ← Gen (1 λ , (cid:96) ) c b ← Enc ( sk , m b ) (cid:21) = 1 . In particular, for correctly generated ciphertexts,

Comp never outputs ⊥ .9 Strongly Correct Comparison:

This informally requires that comparison is consis-tent with decryption . For all security parameters λ , message lengths (cid:96) , and ciphertexts c , c ,Pr (cid:104) Comp ( params , c , c ) = Comp ciph ( sk , c , c ) : ( sk , params ) ← Gen (1 λ , (cid:96) ) (cid:105) = 1 . Security.

For security, we will consider a relaxation of the “best possible” security notion ofBoneh et al. [BLR + Deﬁnition 2.7.

An ORE scheme (

Gen , Enc , Dec , Comp ) is statically secure if, for all eﬃcient ad-versaries A , | Pr[ W ] − Pr[ W ] | is negligible, where W b is the event that A outputs 1 in the followingexperiment: • A produces two message sequences m ( L )1 < m ( L )2 < · · · < m ( L ) q and m ( R )1 < m ( R )2 < · · · < m ( R ) q • The challenger runs ( sk , params ) ← Gen (1 λ , (cid:96) ). It then responds to A with params , as wellas c , . . . , c q where c i = (cid:40) Enc ( sk , m ( L ) i ) if b = 0 Enc ( sk , m ( R ) i ) if b = 1 • A outputs a guess b (cid:48) for b .We also consider a weaker deﬁnition, which only allows the sequences m ( L ) i and m ( R ) i to diﬀerat a single point: Deﬁnition 2.8.

An ORE scheme (

Gen , Enc , Dec , Comp ) is statically single-challenge secure if, forall eﬃcient adversaries A , | Pr[ W ] − Pr[ W ] | is negligible, where W b is the event that A outputs 1in the following experiment: • A produces a sequence of messages m < m < · · · < m q , and challenge messages m L , m R such that m i < m L < m R < m i +1 for some i ∈ [ q − • The challenger runs ( sk , params ) ← Gen (1 λ , (cid:96) ). It then responds to A with params , as wellas c , . . . , c q where c i = Enc ( sk , m i ) and c ∗ = (cid:40) Enc ( sk , m L ) if b = 0 Enc ( sk , m R ) if b = 1 • A outputs a guess b (cid:48) for b .We now argue that these two deﬁnitions are equivalent up to some polynomial loss in security.10 heorem 2.9. ( Gen , Enc , Dec , Comp ) is statically secure if and only if it is statically single-challengesecure.Proof. We prove that single-challenge security implies many-challenge security through a sequenceof hybrids. Each hybrid will only diﬀer in the messages m i that are encrypted, and each adjacenthybrid will only diﬀer in a single message. The ﬁrst hybrid will encrypt m ( L ) i , and the last hybridwill encrypt m ( R ) i . Thus, by applying the single-challenge security for each hybrid, we concludethat the ﬁrst and last hybrids are indistinguishable, thus showing many-challenge security. Hybrid j for j ≤ q . m i = (cid:40) min( m ( L ) i , m ( R ) i ) if i ≤ jm ( L ) i if i > j First, notice that all the m i are in order since both sequences m ( L ) i and m ( R ) i are in order. Second,the only diﬀerence between Hybrid ( j −

1) and

Hybrid j is that m j = m ( L ) j in Hybrid ( j − m j = min( m ( L ) j , m ( R ) j ) in Hybrid j . Thus, single-challenge security implies that each adjacenthybrid is indistinguishable. Moreover, for j where m ( L ) j < m ( R ) j , the two hybrids are actuallyidentical. Hybrid j for j > q . m i = (cid:40) min( m ( L ) i , m ( R ) i ) if i ≤ q − jm ( R ) i if i > q − j Again, notice that all the m i are in order. Moreover, the only diﬀerent between Hybrid (2 q − j ) and Hybrid (2 q − j + 1) is that m j = min( m ( L ) j , m ( R ) j ) in Hybrid (2 q − j ) and m j = m ( R ) j in Hybrid (2 q − j + 1). Thus, single-challenge security implies that each adjacent hybrid is indistinguishable.Moreover, for j where m ( L ) j > m ( R ) j , the two hybrids are actually identical. EncThresh and its Learnability

Let (

Gen , Enc , Dec , Comp ) be a statically secure ORE scheme with strongly correct comparison. Wedeﬁne a concept class

EncThresh , which intuitively captures the class of threshold functions whereexamples are encrypted under the ORE scheme. Throughout this discussion, we will take N = 2 (cid:96) and regard the plaintext space of the ORE scheme to be [ N ] = { , . . . , N } . Ideally we would like,for each threshold t ∈ [ N + 1] and each ( sk , params ) ← Gen (1 λ ), to deﬁne a concept f t, sk , params ( c ) = (cid:40) Dec sk ( c ) < t EncThresh is eﬃciently PAClearnable. 11. In order for the learner to be able to use the comparison function

Comp , it must be given thepublic parameters params generated by the ORE scheme. We address this in the natural wayby attaching a set of public parameters to each example. Moreover, we deﬁne

EncThresh sothat each concept is supported on the single set of public parameters that corresponds to thesecret key used for encryption and decryption.2. Only a subset of binary strings form valid ( sk , params ) pairs that are actually produced by Gen in the ORE scheme. To represent concepts, we need a reasonable encoding scheme for thesevalid pairs. The encoding scheme we choose is the polynomial-length sequence of randomcoin tosses used by the algorithm

Gen to produce ( sk , params ).We now formally describe the concept class EncThresh . Each concept is parameterized by astring r , representing the coin tosses of the algorithm Gen , and a threshold t ∈ [ N + 1] for N = 2 (cid:96) .In what follows, let ( sk r , params r ) be the keys output by Gen (1 λ , (cid:96) ) when run on the sequence ofcoin tosses r . Let f t,r ( params , c ) = (cid:40) params = params r ) ∧ ( Dec ( sk r , c ) (cid:54) = ⊥ ) ∧ ( Dec ( sk r , c ) < t )0 otherwise.Notice that given t and r , the concept f t,r can be eﬃciently evaluated. The description length k ofthe instance space X k = { , } k is polynomial in the security parameter λ and plaintext length (cid:96) . EncThresh

We argue that

EncThresh is eﬃciently PAC learnable by formalizing the argument from the intro-duction. Because we need to include the ORE public parameters in each example, the PAC learner L (Algorithm 3) for EncThresh actually works in two stages. In the ﬁrst stage, L determines whetherthere is signiﬁcant probability mass on examples corresponding to some public parameters params .Recall that each concept in EncThresh is supported on exactly one such set of parameters. If thereis no signiﬁcant mass on any params , then the all-zeroes hypothesis is a good hypothesis. On theother hand, if there is a heavy set of parameters, the learner L applies Comp using those parametersto learn a good comparator.

Theorem 3.1.

Let α, β > . There exists a PAC learning algorithm L for the concept class EncThresh achieving error α and conﬁdence − β . Moreover, L is eﬃcient (running in timepolynomial in the parameters k, /α, log(1 /β ) ).Proof. Fix a target concept f t,r ∈ EncThresh k and a distribution D on examples. First observethat the learner L always outputs a hypothesis with one-sided error, i.e. we always have h ≤ f t,r pointwise. Also observe that f t (cid:48) ,r ≤ f t,r pointwise for any t (cid:48) < t . These both follow from the strongcorrectness of the ORE scheme. Let ( sk r , params r ) denote the keys output by Gen (1 λ , (cid:96) ) whenrun on the sequence of coin tosses r . Let POS denote the set of examples ( params , c ) on which f t,r ( params , c ) = 1. We divide the analysis of the learner in to two cases based on the weight D places on POS. 12 lgorithm 1 Learner L for EncThresh

1. Request examples { ( params , c , b ) , . . . , ( params n , c n , b n ) } for n = (cid:100) log(1 /β ) /α (cid:101) .2. Identify an i for which b i = 1 and set params ∗ = params i ; if no such i exists, return h ≡ G = { j : params j = params ∗ , b j = 1 } . Let j ∗ ∈ G be an index with Comp ( params ∗ , c j , c j ∗ ) ∈ { <, = , ⊥} for all j ∈ G .4. Return h deﬁned by h ( params , c ) = (cid:40) params = params ∗ ) ∧ ( Comp ( params ∗ , c, c j ∗ ) ∈ { <, = } )0 otherwise. Case 1: D places weight at least α on POS. Deﬁne ˆ t ∈ [ N + 1] as the largest ˆ t ≤ t such thaterror D ( f ˆ t,r , f t,r ) ≥ α . Such a ˆ t is guaranteed to exist since f ,r is the all-zeros function, and thereforeerror D ( f ,r , f t,r ) is equal to the weight D places on POS, which is at least α .Suppose f ˆ t +1 ,r ≤ h pointwise. Since h has one-sided error (that is, h ≤ f t,r pointwise), we haveerror D ( f ˆ t +1 ,r , f t,r ) = error D ( f ˆ t +1 ,r , h ) + error D ( h, f t,r ), orerror D ( h, f t,r ) = error D ( f ˆ t +1 ,r , f t,r ) − error D ( f ˆ t +1 ,r , h ) ≤ error D ( f ˆ t +1 ,r , f t,r ) < α. Therefore, it suﬃces to show that f ˆ t +1 ,r ≤ h with probability at least 1 − β . This is guar-anteed as long as L receives a sample ( params r , c i ,

1) with ˆ t ≤ Dec ( sk r , c i ) < t . In other words, f t,r ( params r , c i ) = 1 and f ˆ t,r ( params r , c i ) = 0. Since f ˆ t,r ≤ f t,r pointwise, such samples exactlyaccount for the error between f ˆ t,r and f t,r . Thus since error D ( f ˆ t,r , f t,r ) ≥ α , for each i it mustbe that ˆ t ≤ Dec ( sk r , c i ) < t with probability at least α . The learner L therefore receives some sample c i with ˆ t ≤ Dec ( sk r , c i ) < t with probability at least 1 − (1 − α ) n ≥ − β (since we took n ≥ log(1 /β ) /α ). Case 2: D places less than α weight on POS. Then the identically zero hypothesis has error atmost α , so the claim holds because 0 ≤ h ≤ f t,r . EncThresh

We now prove the hardness of privately learning

EncThresh by constructing an example reidenti-ﬁcation scheme for this concept class. Recall that an example reidentiﬁcation scheme consists oftwo algorithms,

Gen ex , which selects a distribution, a concept, and examples to give to a learner,and Trace ex which attempts to identify one of the examples the learner received.Our example reidentiﬁcation scheme yields a hard distribution even for weak-learning , wherethe error parameter α is taken to be inverse-polynomially close to 1 / Theorem 3.2.

Let γ ( n ) and ξ ( n ) be noticeable functions. Let ( Gen , Enc , Dec , Comp ) be a stati-cally single-challenge secure ORE scheme. Then there exists an (eﬃcient) ( α = − γ, ξ ) -examplereidentiﬁcation scheme ( Gen ex , Trace ex ) for the concept class EncThresh .

13e start with an informal description of the scheme (

Gen ex , Trace ex ). The algorithm Gen ex setsup the parameters of the ORE scheme, chooses the “middle” threshold concept corresponding to t = N/

2, and sets the distribution on examples to be encryptions of uniformly random messages(together with the correct public parameters needed for comparison). Let m < m < · · · < m n denote the sorted sequence of messages whose encryptions make up the sample produced by Gen ex (with overwhelming probability, they are indeed distinct). We can thus break the plaintext space upinto buckets of the form B i = [ m i , m i +1 ). Suppose L is a (weak) learner that produces a hypothesis h with advantage γ over random guessing. Such a hypothesis h must be able to distinguish encryptionsof messages m ≤ t from encryptions of messages m > t with advantage γ . Thus, there must be apair of adjacent buckets B i − , B i for which h can distinguish encryptions of messages from B i − from encryptions from B i with advantage γn .This observation leads to a natural deﬁnition for Trace ex : locate a pair of adjacent buckets B i − , B i that h distinguishes, and output the identity i of the example separating those buckets.Completeness of the resulting scheme, i.e. the fact that some example is reidentiﬁed when L succeeds, follows immediately from the preceding discussion. We argue soundness, i.e. that anexample absent from L ’s sample is not identiﬁed, by reducing to the static security of the OREscheme. The intuition is that if L is not given example i , then it should not be able to distinguishencryptions from bucket B i − from encryptions from bucket B i .To make the security reduction somewhat more precise, suppose for the sake of contradictionthat there is an eﬃcient algorithm L that violates the soundness of ( Gen ex , Trace ex ) with noticeableprobability ξ . That is, there is some i such that even without example i , the algorithm L manages toproduce (with probability ξ ) a hypothesis h that distinguishes B i − from B i . A natural ﬁrst attemptto violate the security of the ORE is to construct an adversary that challenges on the messagesequences m < · · · < m i − < m ( L ) i < m i +1 , <, m n and m < · · · < m i − < m ( R ) i < m i +1 < · · ·

We construct an example reidentiﬁcation scheme for

EncThresh as follows. The algorithm

Gen ex ﬁxes the threshold t = N/ sk r , params r ) ← R Gen (1 λ , (cid:96) ), yielding a concept f t,r . Let D be the distribution of ( params r , Enc ( sk r , m )) for uniformly random m ∈ [ N ]. Let m (cid:48) , . . . , m (cid:48) n ← R [ N ], and let m ≤ · · · ≤ m n be the result of sorting the m (cid:48) i . Let m = 0 and14 n +1 = N . Since n = poly( k ) (cid:28) N , these random messages will be well-spaced. In particular, withoverwhelming probability, | m i +1 − m i | > i , so we assume this is the case in what follows. Gen ex then sets the samples to be ( x = ( params r , Enc ( sk r , m (cid:48) )) , . . . , x n = ( params r , Enc ( sk r , m (cid:48) n ))).Let x = ( params r , Enc ( sk r , m )) be a “junk” example.The algorithm Trace ex creates buckets B i = [ m i , m i +1 ). For each i , let p i = Pr m ∈ B i , coins of Enc [ h ( params r , Enc ( sk , m )) = 1] . By sampling random choices of m in each bucket, Trace ex can eﬃciently compute a good estimateˆ p i ≈ p i for each i (Lemma 3.3). It then accuses the least i for which ˆ p i − − ˆ p i ≥ γn , and ⊥ if noneis found. Lemma 3.3.

Let K = n γ log(9 n/ξ ) . For each i = 0 , . . . , n , let ˆ p i = 1 K K (cid:88) j =1 h ( x j ) where x j = ( params r , Enc ( sk r , m j )) for i.i.d. m , . . . , m K ← R B i . Then | ˆ p i − p i | ≤ γ n for every i with probability at least − ξ/ .Proof. By a Chernoﬀ bound, the probability that any given ˆ p i deviates from p i by more than γ n isat most 2 exp( − Kγ / n ) ≤ ξ n +1) . The lemma follows by a union bound.We ﬁrst verify completeness for this scheme. Let L be a learner for EncThresh using n examples.If the hypothesis h produced by L is ( − γ )-good, then there exists i < i such that p i − p i ≥ γ .If this is the case, then there must be an i for which p i − − p i ≥ γn . Then with probability all but ξ ( n ) / p i , we have ˆ p i − − ˆ p i ≥ γn , so some index is accused.Now we verify soundness. Fix a PPT L , and let j ∗ ∈ [ n ]. Suppose L violates the soundness ofthe scheme with respect to j ∗ , i.e. Pr h ← R L ( S − j ∗ ) , coins of Gen ex [ Trace ex ( h ) = j ∗ ] > ξ. We will use L to construct an adversary A for the ORE scheme that succeeds with noticeableadvantage. It suﬃces to build an adversary for the static (many-challenge) security of ORE,with Theorem 2.9 showing how to convert it to a single-challenge adversary. This many-challengeadversary is presented as Algorithm 2. (While not explicitly stated, the adversary should halt andoutput a random guess whenever the messages it samples are not well-spaced.)Let i ∗ be such that m i ∗ = m (cid:48) j ∗ . With probability at least ξ over the parameters ( sk r , params r ),the choice of messages, the choice of the hypothesis h , and the coins of Trace ex , there is a gapˆ p i ∗ − − ˆ p i ∗ ≥ γn . Hence, by Lemma 3.3, there is a gap p i ∗ − − p i ∗ ≥ γ n with probability at least ξ .We now calculate the advantage of the adversary A . Fix a hypothesis h . For notationalsimplicity, let p = p i ∗ − and let q = p i ∗ . Let y = h ( params r , c i ∗ ) and y = h ( params r , c i ∗ ). Thenthe adversary’s success probability is: 15 lgorithm 2 ORE adversary A

1. Sample m (cid:48) , . . . , m (cid:48) n ← R [ N ], and let m ≤ · · · ≤ m n be the result of sorting the m (cid:48) j . Let π be the permutation on { , . . . , n } such that m π ( j ) = m (cid:48) j . Let m = 0. Let i ∗ = π ( j ∗ ) so that m i ∗ = m (cid:48) j ∗ .2. Construct pairs ( m L , m L ) and ( m R , m R ) as follows. Let B = ( m i ∗ − , m i ∗ ) and B =( m i ∗ , m i ∗ +1 ). Sample m L ≤ m L at random from the same B j , for a random choice of j ∈ { , } . Sample m R ← R B and m R ← R B .3. Challenge on the pair of sequences m , m , . . . , m i ∗ − , m L , m L , m i ∗ , . . . , m n and m , m , . . . , m i ∗ − , m R , m R , m i ∗ , . . . , m n , receiving ciphertexts c , . . . , c i ∗ , c i ∗ , . . . , c n . For j (cid:54) = j ∗ , let c (cid:48) j = c π ( j ) so that c (cid:48) j is an encryption of m (cid:48) j .4. Set t = N/ S − j ∗ = (cid:8) ( params r , c (cid:48) , χ ( m (cid:48) ≤ t )) , . . . , ( params r , c (cid:48) j ∗ − , χ ( m (cid:48) j ∗ − ≤ t )) , ( params r , c , , ( params r , c (cid:48) j ∗ +1 , χ ( m (cid:48) j ∗ +1 ≤ t )) , . . . , ( params r , c (cid:48) n , χ ( m (cid:48) n ≤ t )) (cid:9) = (cid:8) ( params r , c π (1) , χ ( m π (1) ≤ t )) , . . . , ( params r , c π ( j ∗ − , χ ( m π ( j ∗ − ≤ t )) , ( params r , c , , ( params r , c π ( j ∗ +1) , χ ( m π ( j ∗ +1) ≤ t )) , . . . , ( params r , c π ( n ) , χ ( m π ( n ) ≤ t )) (cid:9) Obtain h ← R L ( S − j ∗ ).5. Guess b (cid:48) = 0 if h ( params r , c i ∗ ) = h ( params r , c i ∗ ). Otherwise guess b (cid:48) = 1.Pr[ b (cid:48) = b ] = 12 (Pr[ y = y | b = 0] + Pr[ y (cid:54) = y | b = 1])= 12 ( 12 ( p + (1 − p ) + q + (1 − q ) ) + (1 − pq − (1 − p )(1 − q )))= 12 + 12 ( p − q ) . Thus if p − q ≥ γ n , then the adversary’s advantage is at least γ n . On the other hand, even forarbitrary values of p, q , the advantage is still nonnegative. Therefore, the advantage of the strategyis at least ξγ n − negl( k ) (the negl( k ) term coming from the assumption that the m (cid:48) i sampled wheredistinct), which is a noticeable function of the parameter k . This contradicts the static security ofthe ORE scheme. EncThresh

The statistical query (SQ) model is a natural restriction of the PAC model by which a learneris able to measure statistical properties of its examples, but cannot see the individual examplesthemselves. We recall the deﬁnition of an SQ learner.16 eﬁnition 3.4 (SQ learning [Kea98]) . Let c : X → { , } be a target concept and let D bea distribution over X . In the SQ model, a learner is given access to a statistical query oracle STAT ( c, D ). It may make queries to this oracle of the form ( ψ, τ ), where ψ : X × { , } → { , } isa query function and τ ∈ (0 ,

1) is an error tolerance. The oracle

STAT ( c, D ) responds with a value v such that | v − Pr x ∈D [ ψ ( x, c ( x )) = 1] | ≤ τ . The goal of a learner is to produce, with probabilityat least 1 − β , a hypothesis h : X → { , } such that error D ( c, h ) ≤ α . The query functions mustbe eﬃciently evaluable, and the tolerance τ must be lower bounded by an inverse polynomial in k and 1 /α .The query complexity of a learner is the worst-case number of queries it issues to the statisticalquery oracle. An SQ learner is eﬃcient if it also runs in time polynomial in k, /α, /β .Feldman and Kanade [FK12] investigated the relationship between query complexity and com-putational complexity for SQ learners. They exhibited a concept class C which is eﬃciently PAClearnable and SQ learnable with polynomially many queries, but assuming NP (cid:54) = RP , is noteﬃciently SQ learnable. Concepts in this concept class take the form g φ,y ( x, x (cid:48) ) = (cid:40) PAR y ( x (cid:48) ) if x = φ y ( x (cid:48) ) is the inner product of y and x (cid:48) modulo 2. The concept class C consists of g φ,y where φ is a satisﬁable 3-CNF formula and y is the lexicographically ﬁrst satisfying assignmentto φ . The eﬃcient PAC learner for parities based on Gaussian elimination shows that C is alsoeﬃciently PAC learnable. It is also (ineﬃciently) SQ learnable with polynomially many queries:either the all-zeroes hypothesis is good, or an SQ learner can recover the formula φ bit-by-bit anddetermine the satisfying assignment y by brute force. On the other hand, because parities areinformation-theoretically hard to SQ learn, the satisfying assignment y remains hidden to an SQlearner unless it is able to solve 3-SAT.In this section, we show that the concept class EncThresh shares these properties with C . Namely,we know that EncThresh is eﬃciently PAC learnable and because it is not eﬃciently privatelylearnable, it is not eﬃciently SQ learnable [BDMN05]. We can also show that

EncThresh has an SQlearner with polynomial query complexity. Making this observation about

EncThresh is of interestbecause the hardness of SQ learning

EncThresh does not seem to be related to the (information-theoretic) hardness of SQ learning parities.

Proposition 3.5.

The concept class

EncThresh is (ineﬃciently) SQ learnable with polynomiallymany queries.

As with C there are two cases. In the ﬁrst case, the target distribution places nearly zero weighton examples with params = params r , and so the all-zeroes hypothesis is good. In the second case, thetarget distribution places noticeable weight on these examples, and our learner can use statisticalqueries to recover the comparison parameters params r bit-by-bit. Once the public parameters arerecovered, our learner can determine a corresponding secret key by brute force. Lemma 3.6 belowshows that any corresponding secret key – even one that is not actually sk r – suﬃces. The learnercan then use binary search to determine the threshold value t . Proof.

Let f t,r be the target concept, D be the target distribution, and α be the target error rate.With the statistical query ( x × b (cid:55)→ b, α/ α/

2, then Pr x ∈D [ f t,r ( x ) = 1] ≤ α . If not,then we know that Pr x ∈D [ f t,r ( x ) = 1] ≥ α/

4, so D places signiﬁcant weight on examples preﬁxedwith params r . Suppose now that we are in the latter case.Let m = | params | . For i = 1 , . . . , m , deﬁne ψ i ( params , c, b ) = 1 if params i = 1 and b = 1, and ψ i ( params , c, b ) = 0 otherwise. Then by asking the queries ( ψ i , α/ params ri of params r .Now by brute force search, we determine a secret key sk for which ( sk , params r ) ∈ Range(

Gen ).The recovered secret key sk may not necessarily be the same as sk r . However, the following lemmashows that sk and sk r are functionally equivalent: Lemma 3.6.

Suppose ( Gen , Enc , Dec , Comp ) is a strongly correct ORE scheme. Then for any pair ( sk , params ) , ( sk , params ) ∈ Range(

Gen ) , we have that Dec sk ( c ) = Dec sk ( c ) for all ciphertexts c . With the secret key sk in hand, we now conduct a binary search for the threshold t . Recall thatwe have an estimate v for the weight that f t,r places on positive examples, i.e. | v − Pr x ∈D [ f t,r ( x ) =1] | ≤ α/

4. Starting at t = N/

2, we issue the query ( ϕ , α/

4) where ϕ ( params , c, b ) = 1 iﬀ params = params r and Dec ( sk , c ) < t . Let h t denote the hypothesis h t ( params , c ) = (cid:40) params = params r ) ∧ ( Dec ( sk , c ) (cid:54) = ⊥ ) ∧ ( Dec ( sk , c ) < t )0 otherwise.Thus, the query ( ϕ , α/

4) approximates the weight h t places on positive examples. Let the answerto this query be v . If | v − v | ≤ α/

2, then we can halt and output the good hypothesis h t .Otherwise, if v < v − α/

2, we set the next threshold to t = 3 N/

4, and if v > v + α/

2, we setthe next threshold to t = N/

4. We recurse up to log N = (cid:96) = poly( k ) times, yielding a goodhypothesis for f t,r . Proof of Lemma 3.6.

Suppose the lemma is not true. First suppose that there exists a ciphertext c such that Dec ( sk , c ) = p < p = Dec ( sk , c ). Let c (cid:48) ∈ Enc ( sk , p ). Then by strong correctnessapplied to the parameters ( sk , params ), we must have Comp ( params , c, c (cid:48) ) = “ < ”. Now by strongcorrectness applied to ( sk , params ), we must have Dec ( sk , c (cid:48) ) > p . Thus, p < Dec ( sk , c (cid:48) ) = p < Dec ( sk , c (cid:48) ). Repeating this argument, we obtain a contradiction because the message spaceis ﬁnite.Now suppose instead that there is a ciphertext c for which Dec ( sk , c ) = p ∈ [ N ], but Dec ( sk , c ) = ⊥ . Let c (cid:48) ∈ Enc ( sk , p (cid:48) ) for some p (cid:48) > p . Then Comp ( params , c, c (cid:48) ) = “ < ” by strong correctness ap-plied to ( params , sk ). But Comp ( params , c, c (cid:48) ) = “ ⊥ ” by strong correctness applied to ( params , sk ),again yielding a contradiction. We now explain how to obtain ORE with strongly correct comparison, as all prior ORE schemesonly satisfy the weaker notion of correctness. The lack of strong correctness is easiest to see withthe scheme of Boneh et al. [BLR + +

15] is computed by performing multilinearoperations, and for correctly generated ciphertexts, the operations will give the right answer. How-ever, there exist ciphertexts, namely those with very large noise, for which the comparison function18ives an incorrect output. The result is that the comparison operation is not guaranteed to beconsistent with decrypting the ciphertexts and comparing the plaintexts.As described in the introduction, we give a generic conversion from any ORE scheme with weaklycorrect comparison into a strongly correct scheme. We simply modify the encryption algorithm byadding a non-interactive zero-knowledge (NIZK) proof that the resulting ciphertext is well-formed.Then the decryption and comparison procedures check the proof(s), and only output a non- ⊥ result(either decryption or comparison) if the proof(s) are valid. Instantiating our scheme.

In our construction, we need the (weak) correctness of the underly-ing ORE scheme to hold with probability one. However, the existing protocols only have correctnesswith overwhelming probability, so some minor adjustments need to be made to the protocols. Thisis easiest to see in the ORE scheme of Boneh et al. [BLR + +

15] only achieves the (weak) correctness property with overwhelming probability, whereaswe will require (weak) correctness with probability 1 for the conversion. However, it is straightfor-ward to generate the parameters for the protocol in such a way as to completely eliminate errors.Essentially, the parameters in the protocol have an error term that is generated by a (discrete)Gaussian distribution, which has unbounded support. Instead, we truncate the Gaussian, resultingin a noise distribution with bounded support. By truncating suﬃciently far from the center, theresulting distribution is also statistically close to the full Gaussian, so security of the protocol withtruncated noise follows from the security of the protocol with un-truncated noise. By truncatingthe noise distribution, it is straightforward to set parameters so that no errors can occur.It is similarly straightforward to modify current obfuscation candidates, which are also builtfrom multilinear maps, to obtain perfect (weak) correctness by truncating the noise distributions.Thus, our scheme has instantiations using multilinear maps or iO.

We describe our generic conversion from an order-revaling encryption scheme with weak correctnessusing NIZKs. We will need the following additional tools:

Perfectly binding commitments.

A perfectly binding commitment

Com is a randomized al-gorithm with two properties. The ﬁrst is perfect binding, which states that if

Com ( m ; r ) = Com ( m (cid:48) ; r (cid:48) ), then m = m (cid:48) . The second requirement is computational hiding, which states thatthe distributions Com ( m ) and Com ( m (cid:48) ) are computationally indistinguishable for any messages m, m (cid:48) . Such commitments can be built, say, from any injective one-way function. Perfectly sound NIZK.

A NIZK protocol consists of three algorithms: • Setup (1 λ ) is a randomized algorithm that outputs a common reference string crs . • Prove ( crs , x, w ) takes as input a common reference string crs , an NP statement x , and awitness w , and produces a proof π . • Ver ( crs , x, π ) takes as input a common reference string crs , statement x , and a proof π , andoutputs either accept or reject . 19e make three requirements for a NIZK: • Perfect Completeness.

For all security parameters λ and any true statement x with witness w , Pr[ Ver ( crs , x, π ) = accept : crs ← Setup (1 λ ); π ← Prove ( crs , x, w )] = 1 . • Perfect Soundness.

For all security parameters λ , any false statement x and any (invalid)proof π , Pr[ Ver ( crs , x, π ) = accept : crs ← Setup (1 λ )] = 0 . • Computational Zero Knowledge.

There exists a simulator S , S such that for any com-putationally bounded adversary A , the quantity (cid:107) Pr[ A Prove ( crs , · , · ) ( crs ) = 1 : crs ← Setup (1 λ )] − Pr[ A Sim ( crs ,τ, · , · ) ( crs ) = 1 : ( crs , τ ) ← S (1 λ )] (cid:107) is negligible, where Sim ( crs , τ, x, w ) outputs S ( crs , τ, x ) if w is a valid witness for x , and Sim ( crs , τ, x, w ) = ⊥ if w is invalid.NIZKs satisfying these requirements can be built from bilinear maps [GOS12]. We now give our conversion. Let (

Setup , Prove , Ver ) be a perfectly sound NIZK and (

Gen (cid:48) , Enc (cid:48) , Dec (cid:48) , Comp (cid:48) )and ORE with weakly correct comparison. We will assume that

Enc (cid:48) is deterministic; if not, we canderandomize

Enc (cid:48) using a pseudorandom function. Let

Com be a perfectly binding commitment.We construct a new ORE scheme (

Gen , Enc , Dec , Comp ) with strongly correct comparison: • Gen (1 λ , (cid:96) ): run ( sk (cid:48) , params (cid:48) ) ← Gen (cid:48) (1 λ , (cid:96) ). Let σ = Com ( sk ; r ) for randomness r , andrun crs ← Setup (1 λ ). Then the secret key is sk = ( sk (cid:48) , r, crs ) and the public parameters are params = ( params (cid:48) , σ, crs ). • Enc ( sk , m ): Compute c (cid:48) = Enc (cid:48) ( sk (cid:48) , m ). Let x c (cid:48) be the statement ∃ ˆ m, ˆ sk (cid:48) , ˆ r : σ = Com ( ˆ sk (cid:48) , ˆ r ) ∧ c (cid:48) = Enc (cid:48) ( ˆ sk (cid:48) , ˆ m ). Run π c (cid:48) = Prove ( crs , x c (cid:48) , ( m, sk (cid:48) , r ) ). Output the ciphertext c = ( c (cid:48) , π c (cid:48) ). • Dec ( sk , c ): Write c = ( c (cid:48) , π c (cid:48) ). If Ver ( crs , x c (cid:48) , π c (cid:48) ) = reject , output ⊥ . Otherwise, output m = Dec (cid:48) ( sk (cid:48) , c (cid:48) ). • Comp ( params , c , c ); white c b = ( c (cid:48) b , π c (cid:48) b ) and params = ( params (cid:48) , σ, crs ). If Ver ( crs , x c (cid:48) b , π c (cid:48) b ) = reject for either b = 0 ,

1, then output ⊥ . Otherwise, output Comp (cid:48) ( params (cid:48) , c (cid:48) , c (cid:48) ). Correctness.

Notice that, for each plaintext m , the ciphertext component c (cid:48) = Enc (cid:48) ( sk (cid:48) , m ) isthe unique value such that Dec ( sk , ( c (cid:48) , π )) = m for some proof π . Moreover, the completeness of thezero knowledge proof implies that Enc ( sk , m ) outputs a valid proof. Decryption correctness follows.For strong comparison correctness, consider two ciphertexts c , c where c b = ( c (cid:48) b , π c (cid:48) b ). Sup-pose both proofs π c (cid:48) b are valid, which means that veriﬁcation passes when running Comp and so20 omp ( params , c , c ) = Comp (cid:48) ( params (cid:48) , c (cid:48) , c (cid:48) ). Veriﬁcation also passes when decrypting c b , and so Dec ( sk , c b ) = Dec (cid:48) ( sk (cid:48) , c (cid:48) b ).Since the proofs are valid, c (cid:48) b = Enc (cid:48) ( sk (cid:48) , m b ) for some m b for both b = 0 ,

1. The weak correctnessof comparison for (

Gen (cid:48) , Enc (cid:48) , Dec (cid:48) , Comp (cid:48) ) implies that

Comp (cid:48) ( params (cid:48) , c (cid:48) , c (cid:48) ) = Comp plain ( m , m ).The decryption correctness of ( Gen (cid:48) , Enc (cid:48) , Dec (cid:48) , Comp (cid:48) ) then implies that

Dec ( sk (cid:48) , c (cid:48) b ) = m b , andtherefore Dec ( sk , c b ) = m b . Thus Comp ciph ( sk , c , c ) = Comp plain ( m , m ). Putting it all together, Comp ( params , c , c ) = Comp ciph ( sk , c , c ), as desired.Now suppose one of the proofs π c (cid:48) b are invalid. Then Comp ( params , c , c ) = ⊥ and Dec ( sk , c b ) = ⊥ . This means Comp ciph ( sk , c , c ) = ⊥ = Comp ( params , c , c ), as desired. Security.

To prove security, we ﬁrst use the zero-knowledge simulator to simulate the proofs π (cid:48) c without using a witness (namely, the secret decryption key). Then we use the hiding propertyof the commitment to replace σ with a commitment to 0. At this point, the entire game can besimulated using an Enc (cid:48) oracle, and so the security reduces to the security of

Enc (cid:48) . Theorem 4.1. If ( Gen (cid:48) , Enc (cid:48) , Dec (cid:48) , Comp (cid:48) ) is a (statically) secure ORE, ( Setup , Prove , Ver ) is com-putationally zero knowledge, and Com is computationally hiding, then ( Gen , Enc , Dec , Comp ) is astatically secure ORE.Proof. We will prove security through a sequence of hybrids. Let A be an adversary with advantage (cid:15) in breaking the static security of ( Gen , Enc , Dec , Comp ). Hybrid 0.

This is the real experiment, where σ ← Com ( sk ), crs ← Setup (1 λ ), and the proofs π c (cid:48) are answered using Prove and valid witnesses. A has advantage (cid:15) in distinguishing the left andright ciphertexts. Hybrid 1.

This is the same as

Hybrid 0 , except that crs is generated as ( crs , τ ) ← S (1 λ ), andall proofs are generated using S ( crs , τ, · ). The zero knowledge property of ( Setup , Prove , Ver ) showsthat this is indistinguishable from

Hybrid 0 . Hybrid 2.

This is the same as

Hybrid 1 , except that σ ← Com (0). Since the randomness forcomputing σ is not needed for simulation, this change is undetectable using the hiding of Com .Thus the advantage of A in Hybrid 2 is at least (cid:15) − negl for some negligible function negl. Nowconsider the following adversary cB that attempts to break the security of ( Gen (cid:48) , Enc (cid:48) , Dec (cid:48) , Comp (cid:48) ). B simulates A , and forwards the message sequences m ( L )1 < m ( L )2 < · · · < m ( L ) q and m ( R )1 < m ( R )2 < · · · < m ( R ) q produced by A to its own challenger. In response, it receives params (cid:48) , and ciphertexts c (cid:48) i ,where c (cid:48) i encrypts either m ( L ) i if b = 0 or m ( R ) i if b = 1, for a random bit b chosen by the challenger. B now generates σ ← Com (0) and ( crs , τ ) ← S (1 λ ), and lets params = ( params (cid:48) , σ, crs ). It alsocomputes π c (cid:48) i ← S ( crs , τ, x c (cid:48) i ), and deﬁnes c i = ( c (cid:48) i , π c (cid:48) i ), and gives params and the c i to A . Finallywhen A outputs a guess b (cid:48) for b , B outputs the same guess b (cid:48) .We see that the view of A as a subroutine of B is exactly the same view as in Hybrid 2 . Thus, b (cid:48) = b with probability at least (cid:15) − negl. The security of ( Gen (cid:48) , Enc (cid:48) , Dec (cid:48) , Comp (cid:48) ) implies that thisquantity, and hence (cid:15) , must be negligible. Thus A must have negligible advantage in breaking thesecurity of ( Gen , Enc , Dec , Comp ). 21

A Separation for Representation Learning

In this section, we show how to construct a concept class

ValidSig that separates eﬃcient repre-sentation learning from eﬃcient private representation learning, assuming only the existence ofone-way functions. Here by “representation learning” we mean a restricted form of proper learn-ing where a learner must output a particular representation (i.e. encoding) of a hypothesis h inthe concept class C . As with proper learning, this is a natural syntactic restriction to place ona learner: for instance, if one wants to learn linear threshold functions (LTF), it makes sense torequire a learner to produce the actual coeﬃcients of an LTF, rather than an arbitrary circuit thathappens to compute an LTF.The construction is based on the following elegant idea due to Kobbi Nissim [Nis14]. Suppose H : D → R is a cryptographic hash function with the property that given x , . . . , x n with y = H ( x ) = · · · = H ( x n ), it is infeasible for an eﬃcient adversary to ﬁnd another x for which H ( x ) = y .Consider the concept class HashPoint consisting of the concepts f x ( x (cid:48) ) = (cid:40) H ( x ) = H ( x (cid:48) )0 otherwise.for every x ∈ R . The representation of a concept f x is the point x . The concept class HashPoint isvery easy to learn (by representation) without privacy: a learner can identify any positive example x i and output the representation x i . Since H ( x i ) = H ( x ), the concept f x i is actually equal to thetarget concept f x . On the other hand, a learner that identiﬁes an index x ∗ for which f x ∗ = f x cannot be diﬀerentially private, since the security of the hash function means it is infeasible toproduce such an x ∗ that is not present in the sample.Note that this argument breaks down if one tries to show that HashPoint is not privatelyproperly learnable. While it is infeasible to privately produce a representation x ∗ for which f x ∗ is agood hypothesis, the hypothesis h ( x ) = χ ( H ( x ) = h ( x i )) is equal as a function to every good f x ∗ .Moreover, this hypothesis can be constructed privately as long as the sample contains suﬃcientlymany positive examples.We make this discussion formal by constructing a concept class ValidSig based on super-securedigital signature schemes , which can be constructed from one-way functions. Our use of signaturesto derive hardness results for private proper learning is very analogous to prior hardness results forsynthetic data generation [DNR +

09, UV11].

Deﬁnition 5.1. A digital signature scheme is a triple of algorithms ( Gen , Sign , Ver ) where • Gen (1 λ ) produces a key pair ( sk , vk ). • Sign ( sk , m ) takes the private signing key sk and a message m ∈ { , } ∗ and produces asignature σ for the message m . • Ver ( vk , m, σ ) takes the public veriﬁcation key vk , a message m , and a signature σ , and (de-terministically) outputs a bit indicating whether σ is a valid signature for m .The correctness property of a digital signature scheme is that for every ( sk , vk ) ← R Gen (1 λ ), everymessage m ∈ { , } ∗ , and every signature σ ← R Sign ( sk , m ), we have Ver ( vk , m, σ ) = 1. Deﬁnition 5.2.

A digital signature scheme is super-secure under adaptive chosen-plaintext attacks if all eﬃcient adversaries A win the following weak forgery game with negligible probability:22 The challenger samples ( sk , vk ) ← R Gen (1 λ ). • The adversary A is given vk and oracle access to Sign ( sk , · ). It adaptively queries the signingoracle, obtaining a sequence of message-signature pairs A . It then outputs a forgery ( m ∗ , σ ∗ ). • The value of the game is 1 iﬀ

Ver ( vk , m ∗ , σ ∗ ) = 1 and ( m ∗ , σ ∗ ) / ∈ A .It is known that super-secure digital signature schemes can be constructed from one-way func-tions [NY89, Rom90, KK05, Gol04].We now describe our concept class ValidSig . Let (

Gen , Sign , Ver ) be a super-secure digital sig-nature scheme. We deﬁne a concept class

ValidSig as follows. Fix the message length (cid:96) . For every( vk , m, σ ) with m ∈ { , } (cid:96) and Ver ( vk , m, σ ) = 1, deﬁne the concept f vk ,m,σ ( vk (cid:48) , m (cid:48) , σ (cid:48) ) = (cid:40) vk = vk (cid:48) ) ∧ ( Ver ( vk , m (cid:48) , σ (cid:48) ) = 1)0 otherwise.For convenience, we also include the all-zeroes hypothesis in ValidSig , with representation ⊥ . Theorem 5.3.

Let α, β > . There exists a proper PAC learning algorithm L for the conceptclass ValidSig achieving error α and conﬁdence − β . Moreover, L is eﬃcient (running in timepolynomial in the parameters k, /α, log(1 /β ) ). Algorithm 3

Learner L for ValidSig

1. Request examples { (( vk (cid:48) , m (cid:48) , σ (cid:48) ) , b ) , . . . , (( vk (cid:48) n , m (cid:48) n , σ (cid:48) n ) , b n ) } for n = (cid:100) log(1 /β ) /α (cid:101) .2. Identify an i for which b i = 1 and return the representation ( vk (cid:48) i , m (cid:48) i , σ (cid:48) i ). If no such i exists,return ⊥ representing the all-zeroes hypothesis. Proof.

Fix a target concept f vk ,m,σ ∈ ValidSig k and a distribution D on examples. Let POS denotethe set of examples ( vk (cid:48) , m (cid:48) , σ (cid:48) ) on which f vk ,m,σ ( vk (cid:48) , m (cid:48) , σ (cid:48) ) = 1. We divide the analysis of thelearner into three cases based on the weight D places on the sets POS. Case 1: D places at least α weight on POS. Then L receives a positive example with probabilityat least 1 − (1 − α ) n ≥ − β , and is thus able to identify a concept that equals the target concept. Case 2: D places less than α weight on POS. If L gets a positive example, then the analysis ofCase 1 applies. Otherwise, the all-zeroes hypothesis is α -good.We now prove the hardness of properly privately learning ValidSig by constructing an examplereidentiﬁcation scheme for properly learning this concept class. Our example reidentiﬁcation schemeyields a hard distribution even when the error parameter α is taken to be inverse-polynomially closeto 1. Theorem 5.4.

Let γ ( n ) and ξ ( n ) be noticeable functions. Let ( Gen , Sign , Ver ) be a super-securedigital signature scheme. Then there exists an (eﬃcient) ( α = 1 − γ, ξ ) -example reidentiﬁcationscheme ( Gen ex , Trace ex ) for representation learning the concept class ValidSig .

23e now give the proof of Theorem 5.4.

Proof.

We construct an example reidentiﬁcation scheme for

ValidSig as follows. The algorithm

Gen ex samples ( sk , vk ) ← R Gen (1 λ ), a message m ∈ { , } (cid:96) , and a signature σ ← R Sign ( sk , m ), yieldinga concept f vk ,m,σ . Let D be the distribution of ( vk , m, Sign ( sk , m )) for random m ← R { , } (cid:96) . Gen ex then samples x , x , . . . , x n i.i.d. from D . Given a representation ( vk ∗ , m ∗ , σ ∗ ), the algorithm Trace ex simply identiﬁes an index i for which x i = ( vk ∗ , m ∗ , σ ∗ ), and outputs ⊥ if none is found.We ﬁrst verify completeness for this scheme. Let L be a learner for ValidSig using n examples. Ifthe representation ( vk ∗ , m ∗ , σ ∗ ) produced by L represents an (1 − γ )-good hypothesis, then it mustbe the case that vk ∗ = vk and Ver ( vk , m ∗ , σ ∗ ) = 1. Thus, if L violates the completeness condition,it can be used to construct the weak forgery adversary A (Figure 4) that succeeds with noticeableprobability ξ . Algorithm 4

Weak forgery adversary A

1. Query the signing oracle on random messages m (cid:48) , . . . , m (cid:48) n ← R { , } (cid:96) , obtaining signatures σ (cid:48) , . . . , σ (cid:48) n .2. Run L on the labeled examples (( vk , m (cid:48) , σ (cid:48) ) , , . . . , (( vk , m (cid:48) n , σ (cid:48) n ) , m ∗ , σ ∗ ).3. Output the forgery ( m ∗ , σ ∗ ).Now we verify soundness for the scheme. Observe that for any i , the sample S − i contains noinformation about message m i . Therefore, the learner has a 2 − (cid:96) = negl( k ) probability at producinga representation containing message m i , proving soundness. Acknowledgements.

We gratefully acknowledge Kobbi Nissim and Salil Vadhan for helpfuldiscussions about this work, and also thank Salil Vadhan for suggestions on its presentation.

References [BCO11] Alexandra Boldyreva, Nathan Chenette, and Adam O’Neill. Order-preserving encryp-tion revisited: Improved security analysis and alternative solutions. In

CRYPTO ,2011.[BDMN05] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy:the SuLQ framework. In Chen Li, editor,

PODS , pages 128–138. ACM, 2005.[BKN10] Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on thesample complexity for private learning and private data release. In

TCC , pages 437–454, 2010.[BLR08] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to non-interactive database privacy. In Cynthia Dwork, editor,

STOC , pages 609–618. ACM,2008. 24BLR +

15] Dan Boneh, Kevin Lewi, Mariana Raykova, Amit Sahai, Mark Zhandry, and JoeZimmerman. Semantically secure order-revealing encryption: Multi-input functionalencryption without obfuscation. In

Proc. of EUROCRYPT , 2015.[Blu94] Avrim Blum. Separating distribution-free and mistake-bound learning models overthe boolean domain.

SIAM J. Comput. , 23(5):990–1000, 1994.[BNS13] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Purevs. approximate diﬀerential privacy. In Prasad Raghavendra, Sofya Raskhodnikova,Klaus Jansen, and Jos´e D. P. Rolim, editors,

APPROX-RANDOM , volume 8096 of

Lecture Notes in Computer Science , pages 363–378. Springer, 2013.[BNSV15] Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil P. Vadhan. Diﬀerentially privaterelease and learning of threshold functions.

CoRR , abs/1504.07553, 2015.[BPTG14] Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shaﬁ Goldwasser. Machine learningclassiﬁcation over encrypted data.

IACR Cryptology ePrint Archive , 2014:331, 2014.[BS15] Zvika Brakerski and Gil Segev. Function-private functional encryption in the private-key setting. In

TCC , 2015.[BSW06] Dan Boneh, Amit Sahai, and Brent Waters. Fully collusion resistant traitor tracingwith short ciphertexts and private keys. In

Proceedings of the 24th Annual Inter-national Conference on The Theory and Applications of Cryptographic Techniques ,EUROCRYPT’06, pages 573–592, Berlin, Heidelberg, 2006. Springer-Verlag.[BUV14] Mark Bun, Jonathan Ullman, and Salil P. Vadhan. Fingerprinting codes and the priceof approximate diﬀerential privacy. In

Symposium on Theory of Computing, STOC2014, New York, NY, USA, May 31 - June 03, 2014 , pages 1–10, 2014.[BZ14] Dan Boneh and Mark Zhandry. Multiparty key exchange, eﬃcient traitor tracing, andmore from indistinguishability obfuscation. In

Advances in Cryptology - CRYPTO2014 - 34th Annual Cryptology Conference, Santa Barbara, CA, USA, August 17-21,2014, Proceedings, Part I , pages 480–499, 2014.[CFN94] Benny Chor, Amos Fiat, and Moni Naor. Tracing traitors. In Yvo Desmedt, editor,

CRYPTO , volume 839 of

Lecture Notes in Computer Science , pages 257–270. Springer,1994.[CH11] Kamalika Chaudhuri and Daniel Hsu. Sample complexity bounds for diﬀerentially pri-vate learning. In Sham M. Kakade and Ulrike von Luxburg, editors,

COLT , volume 19of

JMLR Proceedings , pages 155–186. JMLR.org, 2011.[CHS14] Kamalika Chaudhuri, Daniel Hsu, and Shuang Song. The large margin mechanismfor diﬀerentially private maximization. In

Advances in Neural Information Process-ing Systems 27: Annual Conference on Neural Information Processing Systems 2014,December 8-13 2014, Montreal, Quebec, Canada , pages 1287–1295, 2014.[CTUW14] Karthekeyan Chandrasekaran, Justin Thaler, Jonathan Ullman, and Andrew Wan.Faster private release of marginals on small databases.

ITCS 2014 , 2014.25DKM +

06] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and MoniNaor. Our data, ourselves: Privacy via distributed noise generation. In Serge Vaude-nay, editor,

EUROCRYPT , volume 4004 of

Lecture Notes in Computer Science , pages486–503. Springer, 2006.[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noiseto sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors,

TCC ,volume 3876 of

Lecture Notes in Computer Science , pages 265–284. Springer, 2006.[DNR +

09] Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Rothblum, and Salil P. Vad-han. On the complexity of diﬀerentially private data release: eﬃcient algorithms andhardness results. In Michael Mitzenmacher, editor,

STOC , pages 381–390. ACM, 2009.[DNT14] Cynthia Dwork, Aleksandar Nikolov, and Kunal Talwar. Using convex relaxations foreﬃciently and privately releasing marginals. In

Proceedings of the Thirtieth AnnualSymposium on Computational Geometry , SOCG’14, pages 261:261–261:270, New York,NY, USA, 2014. ACM.[DRV10] Cynthia Dwork, Guy N. Rothblum, and Salil P. Vadhan. Boosting and diﬀerentialprivacy. In

FOCS , pages 51–60. IEEE Computer Society, 2010.[FK12] Vitaly Feldman and Varun Kanade. Computational bounds on statistical query learn-ing. In

COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27,2012, Edinburgh, Scotland , pages 16.1–16.22, 2012.[FX14] Vitaly Feldman and David Xiao. Sample complexity bounds on diﬀerentially privatelearning via communication complexity.

CoRR , abs/1402.6278, 2014.[GGG +

14] Shaﬁ Goldwasser, S. Dov Gordon, Vipul Goyal, Abhishek Jain, Jonathan Katz, Feng-Hao Liu, Amit Sahai, Elaine Shi, and Hong-Sheng Zhou. Multi-input functional en-cryption. In

EUROCRYPT , pages 578–602, 2014.[GGH13a] Sanjam Garg, Craig Gentry, and Shai Halevi. Candidate multilinear maps from ideallattices. In

Proc. of EUROCRYPT , 2013.[GGH + Proc. of FOCS , 2013.[GGHZ14] Sanjam Garg, Craig Gentry, Shai Halevi, and Mark Zhandry. Fully secure functionalencryption without obfuscation, 2014.[GLN13] Thore Graepel, Kristin Lauter, and Michael Naehrig. Ml conﬁdential: Machine learn-ing on encrypted data. In Taekyoung Kwon, Mun-Kyu Lee, and Daesung Kwon, edi-tors,

Information Security and Cryptology ICISC 2012 , volume 7839 of

Lecture Notesin Computer Science , pages 1–21. Springer Berlin Heidelberg, 2013.[Gol04] Oded Goldreich.

Foundations of cryptography: volume 2, basic applications . Cam-bridge university press, 2004. 26GOS12] Jens Groth, Rafail Ostrovsky, and Amit Sahai. New techniques for noninteractivezero-knowledge.

J. ACM , 59(3):11:1–11:35, June 2012.[GRU12] Anupam Gupta, Aaron Roth, and Jonathan Ullman. Iterative constructions and pri-vate data release. In

TCC , pages 339–356, 2012.[HLM12] Moritz Hardt, Katrina Ligett, and Frank McSherry. A simple and practical algorithmfor diﬀerentially private data release. In Peter L. Bartlett, Fernando C. N. Pereira,Christopher J. C. Burges, L´eon Bottou, and Kilian Q. Weinberger, editors,

NIPS ,pages 2348–2356, 2012.[HR10] Moritz Hardt and Guy N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In

FOCS , pages 61–70. IEEE Computer Society, 2010.[HRS12] Moritz Hardt, Guy N. Rothblum, and Rocco A. Servedio. Private data release vialearning thresholds. In Dana Randall, editor,

SODA , pages 168–187. SIAM, 2012.[Kea98] Michael Kearns. Eﬃcient noise-tolerant learning from statistical queries.

J. ACM ,45(6):983–1006, November 1998.[Kha95] Michael Kharitonov. Cryptographic lower bounds for learnability of boolean functionson the uniform distribution.

J. Comput. Syst. Sci. , 50(3):600–610, 1995.[KK05] Jonathan Katz and Chiu-Yuen Koo. On constructing universal one-way hash functionsfrom arbitrary one-way functions.

IACR Cryptology ePrint Archive , 2005:328, 2005.[KLN +

11] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova,and Adam Smith. What can we learn privately?

SIAM J. Comput. , 40(3):793–826,2011.[KV94] Michael Kearns and Leslie Valiant. Cryptographic limitations on learning booleanformulae and ﬁnite automata.

J. ACM , 41(1):67–95, January 1994.[Nis14] Kobbi Nissim. Personal communication, July 2014.[NY89] M. Naor and M. Yung. Universal one-way hash functions and their cryptographicapplications. In

Proceedings of the Twenty-ﬁrst Annual ACM Symposium on Theoryof Computing , STOC ’89, pages 33–43, New York, NY, USA, 1989. ACM.[PV88] Leonard Pitt and Leslie G. Valiant. Computational limitations on learning from ex-amples.

J. ACM , 35(4):965–984, October 1988.[Rom90] J. Rompel. One-way functions are necessary and suﬃcient for secure signatures. In

Proceedings of the Twenty-second Annual ACM Symposium on Theory of Computing ,STOC ’90, pages 387–394, New York, NY, USA, 1990. ACM.[RR10] Aaron Roth and Tim Roughgarden. Interactive privacy via the median mechanism.In

STOC , pages 765–774, 2010.[Ser00] Rocco A. Servedio. Computational sample complexity and attribute-eﬃcient learning.

J. Comput. Syst. Sci. , 60(1):161–178, 2000.27SG04] Rocco A. Servedio and Steven J. Gortler. Equivalences and separations between quan-tum and classical learnability.

SIAM J. Comput. , 33(5):1067–1092, 2004.[TUV12] Justin Thaler, Jonathan Ullman, and Salil P. Vadhan. Faster algorithms for privatelyreleasing marginals. In

ICALP (1) , pages 810–821, 2012.[Ull13] Jonathan Ullman. Answering n o (1) counting queries with diﬀerential privacy is hard.In STOC , pages 361–370, 2013.[UV11] Jonathan Ullman and Salil P. Vadhan. PCPs and the hardness of generating privatesynthetic data. In Yuval Ishai, editor,

TCC , volume 6597 of

Lecture Notes in ComputerScience , pages 400–416. Springer, 2011.[Val84] L. G. Valiant. A theory of the learnable.