Free Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms
Zeyu Ding, Yuxin Wang, Yingtai Xiao, Guanhong Wang, Danfeng Zhang, Daniel Kifer
NNoname manuscript No. (will be inserted by the editor)
Free Gap Estimates from the Exponential Mechanism, SparseVector, Noisy Max and Related Algorithms
Zeyu Ding · Yuxin Wang · Yingtai Xiao · Guanhong Wang · Danfeng Zhang · Daniel Kifer Received: date / Accepted: date
Abstract
Private selection algorithms, such as the Ex-ponential Mechanism, Noisy Max and Sparse Vector,are used to select items (such as queries with large an-swers) from a set of candidates, while controlling pri-vacy leakage in the underlying data. Such algorithmsserve as building blocks for more complex differentiallyprivate algorithms. In this paper we show that thesealgorithms can release additional information relatedto the gaps between the selected items and the othercandidates for free (i.e., at no additional privacy cost).This free gap information can improve the accuracy ofcertain follow-up counting queries by up to 66%. Weobtain these results from a careful privacy analysis ofthese algorithms. Based on this analysis, we further pro-pose novel hybrid algorithms that can dynamically saveadditional privacy budget.
Keywords
Differential Privacy · Exponential Mecha-nism · Noisy Max · Sparse Vector
Zeyu DingE-mail: [email protected] WangE-mail: [email protected] XiaoE-mail: [email protected] WangE-mail: [email protected] ZhangE-mail: [email protected] KiferE-mail: [email protected] Department of Computer Science and Engineering, Pennsyl-vania State University, University Park, PA 16802, USA
Industry and government agencies are increasingly adopt-ing differential privacy [18] to protect the confidentialityof users who provide data. Current and planned majorapplications include data gathering by Google [21,7],Apple [43], and Microsoft [13]; database querying byUber [28]; and publication of population statistics atthe U.S. Census Bureau [34,9,26,2].The accuracy of differentially private data releasesis very important in these applications. One way to im-prove accuracy is to increase the value of the privacyparameter (cid:15) , known as the privacy loss budget, as itprovides a tradeoff between an algorithm’s utility andits privacy protections. However, values of (cid:15) that aredeemed too high can subject a company to criticismsof not providing enough privacy [42]. For this reason,researchers invest significant effort in tuning algorithms[11,47,29,1,40,22] and privacy analyses [8,38,40,20] toprovide better utility while using smaller privacy bud-gets.Differentially private algorithms are built on smallercomponents called mechanisms [37]. Popular mecha-nisms include the Laplace Mechanism [18], GeometricMechanism [24], Noisy Max [19], Sparse Vector Tech-nique (SVT) [19,33], and the Exponential Mechanism[36]. As we will explain in this paper, some of thesemechanisms, such as the Exponential Mechanism, NoisyMax and SVT, inadvertently throw away informationthat is useful for designing accurate algorithms. Ourcontribution is to present novel variants of these mech-anisms that provide more functionality at the same pri-vacy cost (under pure differential privacy).Given a set of queries, Noisy Max returns the iden-tity (not value) of the query that is likely to have thelargest value – it adds noise to each query answer and a r X i v : . [ c s . D B ] D ec Zeyu Ding et al. returns the index of the query with the largest noisyvalue. The Exponential Mechanism is a replacement forNoisy Max in situations where query answers have util-ity scores. Meanwhile, SVT is an online algorithm thattakes a stream of queries and a predefined public thresh-old T . It tries to return the identities (not values) of thefirst k queries that are likely larger than the threshold.To do so, it adds noise to the threshold. Then, as it se-quentially processes each query, it outputs “ (cid:62) ” or “ ⊥ ”,depending on whether the noisy value of the currentquery is larger or smaller than the noisy threshold. Themechanism terminates after k “ (cid:62) ” outputs.In recent work [45], using program verification tools,Wang et al. showed that SVT can provide additionalinformation at no additional cost to privacy . That is,when SVT returns “ (cid:62) ” for a query, it can also returnthe gap between its noisy value and the noisy thresh-old. We refer to their algorithm as SVT with Gap.Inspired by this program verification work, we pro-pose novel variations of Exponential Mechanism, SVTand Noisy Max that add new functionality. For SVT, weshow that in addition to releasing this gap information,even stronger improvements are possible – we presentan adaptive version that can answer more queries thanbefore by controlling how much privacy budget it usesto answer each query. The intuition is that we wouldlike to spend less of our privacy budget for queries thatare probably much larger than the threshold (comparedto queries that are probably closer to the threshold).A careful accounting of the privacy impact shows thatthis is possible. Our experiments confirm that AdaptiveSVT with Gap can answer many more queries than theprior versions [33,19,45] at the same privacy cost.For Noisy Max, we show that it too inadvertentlythrows away information. Specifically, at no additionalcost to privacy , it can release an estimate of the gapbetween the largest and second largest queries (we callthe resulting mechanism Noisy Max with Gap). We gen-eralize this result to Noisy Top-K – showing that onecan release an estimate of the identities of the k largestqueries and, at no extra privacy cost, release noisy esti-mates of the pairwise gaps (differences) among the top k + 1 queries.For Exponential Mechanism, we show that there isalso a concept of a gap, which corresponds to the noisydifference in utility between the selected query and thebest non-selected query. One of the challenges with theExponential Mechanism is that for efficiency purposesit can use complex sampling algorithms to select the This was a surprising result given the number of incorrectattempts at improving SVT based on flawed manual proofs[33] and shows the power of automated program verificationtechniques. chosen candidate. We show that it is possible to releasethe noisy gap information even if the sampling algo-rithms are treated as black boxes (i.e., without accessto its intermediate computations).The extra noisy gap information opens up new di-rections in the construction of differentially private al-gorithms and can be used to improve accuracy of cer-tain subsequent queries. For instance, one common taskis to use Noisy Max to select the approximate top k queries and then use additional privacy loss budget toobtain noisy answers to these queries. We show thata postprocessing step can combine these noisy answerswith gap information to improve accuracy by up to 66%for counting queries. We provide similar applications forthe free gap information in SVT.This paper is an extension of a conference paper [14].For this extension we have added the following results:(a) free gap results for the Exponential Mechanism, (b)free gap results when Noisy Max and SVT are usedwith one-sided noise, which improves on the accuracyreported in [14] for two-sided noise, (c) novel hybridalgorithms that combine SVT and Noisy Max into anoffline selection procedure; these algorithms return the identities of the approximate top- k queries, but only ifthey are larger than a pre-specified threshold. These al-gorithms save privacy budget if fewer than k queries areapproximately over the threshold, in which case theyalso provide free estimates of the query answers (if all k queries are approximately over the threshold, then weobtain information about the gaps between them).We prove most of our results using the alignmentof random variables framework [33,11,45,46], which isbased on the following question: if we change the inputto a program, how must we change its random vari-ables so that output remains the same? This techniqueis used to prove the correctness of almost all pure dif-ferential privacy mechanisms [19] but needs to be usedin sophisticated ways to prove the correctness of themore advanced algorithms [33,11,19,45,46]. Neverthe-less, alignment of random variables is often used incor-rectly (as discussed by Lyu et al. [33]). Thus a secondarycontribution of our work is to lay out the precise stepsand conditions that must be checked and to providehelpful lemmas that ensure these conditions are met.The Exponential Mechanism does not fit in this frame-work and requires its own proof techniques, which weexplain in Section 8. To summarize, our contributionsare as follows: – We provide a simplified template for writing cor-rectness proofs for intricate differentially private al-gorithms. ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 3 – Using this technique, we propose and prove the cor-rectness of two new mechanisms: Noisy Top-K withGap and Adaptive SVT with Gap. These algorithmsimprove on the original versions of Noisy Max andSVT by taking advantage of free information (i.e.,information that can be released at no additionalprivacy cost) that those algorithms inadvertentlythrow away. We also show that the free gap informa-tion can be maintained even when these algorithmsuse one-sided noise. This variation improves the ac-curacy of the gap information. – We demonstrate some of the uses of the gap infor-mation that is provided by these new mechanisms.When an algorithm needs to use Noisy Max or SVTto select some queries and then measure them (i.e.,obtain their noisy answers), we show how the gapinformation from our new mechanisms can be usedto improve the accuracy of the noisy measurements.We also show how the gap information in SVT canbe used to estimate the confidence that a query’strue answer really is larger than the threshold. – We show that the Exponential Mechanism can alsorelease free gap information. Noting that the freegap extensions of Noisy Max and SVT required ac-cess to the internal state of those algorithms, weshow that this is unnecessary for Exponential Mech-anism. This is useful because implementations of Ex-ponential Mechanism can be very complex and usea variety of different sampling routines. – We propose two novel hybridizations of Noisy Maxand SVT. These algorithms can release the identitiesof the approximate top- k queries as long as they arelarger than a pre-specified threshold. If fewer than k queries are returned, the algorithms save privacybudget and the gap information they release directlyturns into estimates of the query answers (i.e., thealgorithm returns the query identities and their an-swers for free). If k queries are returned then thealgorithms still return the gaps between their an-swers. – We empirically evaluate the mechanisms on a varietyof datasets to demonstrate their improved utility.In Section 2, we discuss related work. We presentbackground and notation in Section 3. We present sim-plified proof templates for randomness alignment inSection 4. We present Adaptive SVT with Gap in Sec-tion 5 and Noisy Top-K with Gap in Section 6. Wepresent the novel algorithms that combine elements ofNoisy Max and SVT in 7. We present Exponential Mech-anism with Gap algorithms in Section 8. We present ex-periments in Section 9, proofs underlying the alignment of randomness framework in Section 10 and conclusionsin Section 11. Other proofs appear in the Appendix.
Selection algorithms, such as Exponential Mechanism[36,41], Sparse Vector Technique (SVT) [19,33], andNoisy Max [19] are used to select a set of items (typi-cally queries) from a much larger set. They have appli-cations in hyperparameter tuning [11,32], iterative con-struction of microdata [27], feature selection [44], fre-quent itemset mining [6], exploring a privacy/accuracytradeoff [31], data pre-processing [12], etc. Various gen-eralizations have been proposed [31,5,44,41,10,32]. Liuand Talwar [32] and Raskhodnikova and Smith [41] ex-tend the exponential mechanism for arbitrary sensitiv-ity queries. Beimel et al. [5] and Thakurta and Smith[44] use the propose-test-release framework [17] to finda gap between the best and second best queries and,if the gap is large enough, release the identity of thebest query. These two algorithms rely on a relaxation ofdifferential privacy called approximate ( (cid:15), δ )-differentialprivacy [16] and can fail to return an answer (in whichcase they return ⊥ ). Our algorithms work with pure (cid:15) -differential privacy. Chaudhuri et al. [10] also proposeda large margin mechanism (with approximate differ-ential privacy) which finds a large gap separating topqueries from the rest and returns one of them.There have also been unsuccessful attempts to gen-eralize selection algorithms such as SVT (incorrect ver-sions are catalogued by Lyu et al. [33]), which has sparkedinnovations in program verification for differential pri-vacy (e.g., [4,3,46,45]) with techniques such as proba-bilistic coupling [4] and a simplification based on ran-domness alignment [46]. These are similar to ideas be-hind handwritten proofs [11,19,33] – they consider whatchanges need to be made to random variables in orderto make two executions of a program, with differentinputs, produce the same output. It is a powerful tech-nique that is behind almost all proofs of differential pri-vacy, but is very easy to apply incorrectly [33]. In thispaper, we state and prove a more general version ofthis technique in order to prove correctness of our algo-rithms and also provide additional results that simplifythe application of this technique. In this paper, we use the following notation. D and D (cid:48) refer to databases. We use the notation D ∼ D (cid:48) to Zeyu Ding et al. represent adjacent databases. M denotes a random-ized algorithm whose input is a database. Ω denotesthe range of M and ω ∈ Ω denotes a specific output of M . We use E ⊆ Ω to denote a set of possible outputs.Because M is randomized, it also relies on a randomnoise vector H ∈ R ∞ . This noise sequence is infinite,but of course M will only use a finite-length prefix of H . Some of the commonly used noise distributions forthis vector H include the Laplace distribution, the Ex-ponential distribution and the Geometric distribution.Their properties are summarized in Table 1.Table 1: Noise Distributions Symbol Support Density/Mass Mean Variance
Lap( β ) R β exp( − | x | β ) 0 2 β Exp( β ) [0 , ∞ ) β exp( − xβ ) β β Geo( p ) { , , . . . } p (1 − p ) n p − pp When we need to draw attention to the noise, weuse the notation M ( D, H ) to indicate the executionof M with database D and randomness coming from H . Otherwise we use the notation M ( D ). We define H MD:E = { H | M ( D, H ) ∈ E } to be the set of noise vec-tors that allow M , on input D , to produce an outputin the set E ⊆ Ω . To avoid overburdening the notation,we write H D:E for H MD:E and H D (cid:48) :E for H MD (cid:48) :E when M is clear from the context. When E consists of a singlepoint ω , we write these sets as H D: ω and H D (cid:48) : ω . Thisnotation is summarized in Table 2.Table 2: Notation Symbol Meaning M randomized algorithm D, D (cid:48) database D ∼ D (cid:48) D is adjacent to D (cid:48) H = ( η , η , . . . ) input noise vector Ω the space of all output of Mω a possible output; ω ∈ ΩE a set of possible outputs; E ⊆ Ω H D:E = H MD:E { H | M ( D, H ) ∈ E }H D: ω = H MD: ω { H | M ( D, H ) = ω } The notion of adjacency depends on the application. Somepapers define it as D can be obtained from D (cid:48) by modifyingone record [18] or by adding/deleting one record [15]. a database. It has a parameter (cid:15) > (cid:15) , theprobability of any output is barely affected by any per-son’s record). Definition 1 (Pure Differential Privacy [15])
Let (cid:15) >
0. A randomized algorithm M with output space Ω satisfies (pure) (cid:15) -differential privacy if for all E ⊆ Ω andall pairs of adjacent databases D ∼ D (cid:48) , the followingholds: P [ M ( D, H ) ∈ E ] ≤ e (cid:15) P [ M ( D (cid:48) , H (cid:48) ) ∈ E ] (1)where the probability is only over the randomness of H .With the notation in Table 2, the differential privacycondition from Equation (1) is P [ H D:E ] ≤ e (cid:15) P [ H D (cid:48) :E ].Differential privacy enjoys the following properties: – Resilience to Post-Processing. If we apply an algo-rithm A to the output of an (cid:15) -differentially privatealgorithm M , then the composite algorithm A ◦ M still satisfies (cid:15) -differential privacy. In other words,privacy is not reduced by post-processing. – Composition. If M , M , . . . , M k satisfy differentialprivacy with privacy loss budgets (cid:15) , . . . , (cid:15) k , the al-gorithm that runs all of them and releases their out-puts satisfies ( (cid:80) i (cid:15) i )-differential privacy.Many differentially private algorithms take advan-tage of the Laplace mechanism [36], which provides anoisy answer to a vector-valued query q based on its L global sensitivity ∆ q , defined as follows: Definition 2 ( L Global Sensitivity [19])
The ( L )global sensitivity of a query q is ∆ q = sup D ∼ D (cid:48) (cid:107) q ( D ) − q ( D (cid:48) ) (cid:107) . Theorem 1 (Laplace Mechanism [18])
Given a pri-vacy loss budget (cid:15) , consider the mechanism that returns q ( D ) + H , where H is a vector of independent randomsamples from the Lap( ∆ q /(cid:15) ) distribution. This Laplacemechanism satisfies (cid:15) -differential privacy. Other kinds of additive noise distributions that canbe used in place of Laplace in Theorem 1 include Dis-crete Laplace [24] (when all query answers are integersor multiples of a common base) and Staircase [23].In some cases, queries may have additional struc-ture, such as monotonicity , that can allow algorithmsto provide privacy with less noise (such as one-sidedNoisy Max [19]). ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 5
Definition 3 (Monotonicity)
A list of queries q =( q , q , . . . ) with numerical values is monotonic if forall pair of adjacent databases D ∼ D (cid:48) we have either ∀ i : q i ( D ) ≤ q i ( D (cid:48) ), or ∀ i : q i ( D ) ≥ q i ( D (cid:48) ).Monotonicity is a natural property that is satisfiedby counting queries – when a person is added to adatabase, the value of each query either stays the sameor increases by 1. To establish that the algorithms we propose are differ-entially private, we use an idea called randomness align-ment that previously had been used to prove the pri-vacy of a variety of sophisticated algorithms [19,33,11]and incorporated into verification/synthesis tools [46,45,3]. While powerful, this technique is also easy to useincorrectly [33], as there are many technical conditionsthat need to be checked. In this section, we present re-sults (namely Lemma 1) that significantly simplify thisprocess and make it easy to prove the correctness of ourproposed algorithms.In general, to prove (cid:15) -differential privacy for an al-gorithm M , one needs to show P [ M ( D, H ) ∈ E ] ≤ e (cid:15) P [ M ( D (cid:48) , H (cid:48) ) ∈ E ] for all pairs of adjacent databases D ∼ D (cid:48) and sets of possible outputs E ⊆ Ω . In ournotation, this inequality is represented as P [ H D:E ] ≤ e (cid:15) P [ H D (cid:48) :E ]. Establishing such inequalities is often donewith the help of a function φ D , D (cid:48) , called a random-ness alignment (there is a function φ D , D (cid:48) for every pair D ∼ D (cid:48) ), that maps noise vectors H into noise vec-tors H (cid:48) so that M ( D (cid:48) , H (cid:48) ) produces the same outputas M ( D, H ). Formally,
Definition 4 (Randomness Alignment)
Let M bea randomized algorithm. Let D ∼ D (cid:48) be a pair of adja-cent databases. A randomness alignment is a function φ D , D (cid:48) : R ∞ → R ∞ such that1. The alignment does not output invalid noise vec-tors (e.g., it cannot produce negative numbers forrandom variables that should have the exponentialdistribution).2. For all H on which M ( D, H ) terminates, M ( D, H ) = M ( D (cid:48) , φ D , D (cid:48) ( H )). Example 1
Let D be a database that records the salaryof every person, which is guaranteed to be between 0and 100. Let q ( D ) be the sum of the salaries in D .The sensitivity of q is thus 100. Let H = ( η , η , . . . )be a vector of independent Lap(100 /(cid:15) ) random vari-ables. The Laplace mechanism outputs q ( D ) + η (andignores the remaining variables in H ). For every pair of adjacent databases D ∼ D (cid:48) , one can define the cor-responding randomness alignment φ D , D (cid:48) ( H ) = H (cid:48) =( η (cid:48) , η (cid:48) , . . . ), where η (cid:48) = η + q ( D ) − q ( D (cid:48) ) and η (cid:48) i = η i for i >
1. Note that q ( D ) + η = q ( D (cid:48) ) + η (cid:48) , so theoutput of M remains the same.In practice, φ D , D (cid:48) is constructed locally (piece bypiece) as follows. For each possible output ω ∈ Ω , onedefines a function φ D , D (cid:48) ,ω that maps noise vectors H into noise vectors H (cid:48) with the following properties: if M ( D, H ) = ω then M ( D (cid:48) , H (cid:48) ) = ω (that is, φ D , D (cid:48) ,ω only cares about what it takes to produce the specificoutput ω ). We obtain our randomness alignment φ D , D (cid:48) in the obvious way by piecing together the φ D , D (cid:48) ,ω asfollows: φ D , D (cid:48) ( H ) = φ D , D (cid:48) ,ω * ( H ), where ω ∗ is the out-put of M ( D, H ). Formally,
Definition 5 (Local Alignment)
Let M be a ran-domized algorithm. Let D ∼ D (cid:48) be a pair of adjacentdatabases and ω a possible output of M . A local align-ment for M is a function φ D , D (cid:48) ,ω : H D: ω → H D (cid:48) : ω (seenotation in Table 2) such that for all H ∈ H D: ω , wehave M ( D, H ) = M ( D (cid:48) , φ D , D (cid:48) ,ω ( H )). Example 2
Continuing the setup from Example 1, con-sider the mechanism M that, on input D , outputs (cid:62) if q ( D ) + η ≥ ,
000 (i.e. if the noisy total salary is atleast 10 , ⊥ if q ( D ) + η < , D (cid:48) bea database that differs from D in the presence/absenceof one record. Consider the local alignments φ D,D (cid:48) , (cid:62) and φ D,D (cid:48) , ⊥ defined as follows. φ D,D (cid:48) , (cid:62) ( H ) = H (cid:48) =( η (cid:48) , η (cid:48) , . . . ) where η (cid:48) = η + 100 and η (cid:48) i = η i for i > φ D,D (cid:48) , ⊥ ( H ) = H (cid:48)(cid:48) = ( η (cid:48)(cid:48) , η (cid:48)(cid:48) , . . . ) where η (cid:48)(cid:48) = η −
100 and η (cid:48)(cid:48) i = η i for i >
1. Clearly, if M ( D, H ) = (cid:62) then M ( D (cid:48) , H (cid:48) ) = (cid:62) and if M ( D, H ) = ⊥ then M ( D (cid:48) , H (cid:48)(cid:48) ) = ⊥ . We piece these two local alignmentstogether to create a randomness alignment φ D , D (cid:48) ( H ) = H ∗ = ( η ∗ , η ∗ , . . . ) where: η ∗ = η + 100 if M ( D, H ) = (cid:62) (i.e. q ( D ) + η ≥ , η −
100 if M ( D, H ) = ⊥ (i.e. q ( D ) + η < , η ∗ i = η i for i > Special properties of alignments.
Not all alignmentscan be used to prove differential privacy. In this sectionwe discuss some additional properties that help provedifferential privacy. We first make two mild assump-tions about the mechanism M : (1) it terminates withprobability one and (2) based on the output of M , we That is, for each input D , there might be some randomvectors H for which M does not terminate, but the totalprobability of these vectors is 0, so we can ignore them. Zeyu Ding et al. can determine how many random variables it used. Thevast majority of differentially private algorithms in theliterature satisfy these properties.We next define two properties of a local alignment:whether it is acyclic and what its cost is. Definition 6 (Acyclic)
Let M be a randomized al-gorithm. Let φ D , D (cid:48) ,ω be a local alignment for M . Forany H = ( η , η , . . . ), let H (cid:48) = ( η (cid:48) , η (cid:48) , . . . ) denote φ D , D (cid:48) ,ω ( H ). We say that φ D , D (cid:48) ,ω is acyclic if there existsa permutation π and piecewise differentiable functions ψ ( j ) D,D (cid:48) ,ω such that: η (cid:48) π (1) = η π (1) + constant that only depends on D , D (cid:48) , ωη (cid:48) π ( j ) = η π ( j ) + ψ ( j ) D,D (cid:48) ,ω ( η π (1) , . . . , η π ( j − ) for j ≥ φ D , D (cid:48) ,ω is acyclic ifthere is some ordering of the variables so that η (cid:48) j is thesum of η j and a function of the variables that came ear-lier in the ordering. The local alignments φ D,D (cid:48) , (cid:62) and φ D,D (cid:48) , ⊥ from Example 2 are both acyclic (in general,each local alignment function is allowed to have its ownspecific ordering and differentiable functions ψ ( j ) D,D (cid:48) ,ω ).The pieced-together randomness alignment φ D , D (cid:48) itselfneed not be acyclic. Definition 7 (Alignment Cost)
Let M be a ran-domized algorithm that uses H as its source of ran-domness. Let φ D , D (cid:48) ,ω be a local alignment for M . Forany H = ( η , η , . . . ), let H (cid:48) = ( η (cid:48) , η (cid:48) , . . . ) denote φ D , D (cid:48) ,ω ( H ). Suppose each η i is generated independentlyfrom a distribution f i with the property that ln( f i ( x ) f i ( y ) ) ≤ c i | x − y | for all x, y in the domain of f i – this includesthe Lap( β ), Exp( β ), Geo( p ) distributions along withDiscrete Laplace [24] and Staircase [23]. Then the costof φ D , D (cid:48) ,ω is defined as: cost( φ D , D (cid:48) ,ω ) = (cid:80) i c i | η i − η (cid:48) i | . The following lemma uses those properties to estab-lish that M satisfies (cid:15) -differential privacy. Lemma 1
Let M be a randomized algorithm with in-put randomness H = ( η , η , . . . ) . If the following con-ditions are satisfied, then M satisfies (cid:15) -differential pri-vacy.1. M terminates with probability 1.2. The number of random variables used by M can bedetermined from its output.3. Each η i is generated independently from a distri-bution f i with the property that ln( f i ( x ) /f i ( y )) ≤ c i | x − y | for all x, y in the domain of f i .4. For every D ∼ D (cid:48) and ω there exists a local align-ment φ D , D (cid:48) ,ω that is acyclic with cost( φ D , D (cid:48) ,ω ) ≤ (cid:15) . 5. For each D ∼ D (cid:48) the number of distinct local align-ments is countable. That is, the set { φ D , D (cid:48) ,ω | ω ∈ Ω } is countable (i.e., for many choices of ω we getthe same exact alignment function). We defer the proof to Section 10.
Example 3
Consider the randomness alignment φ D , D (cid:48) from Example 1. We can define all of the local align-ments φ D , D (cid:48) ,ω to be the same function: φ D , D (cid:48) ,ω ( H ) = φ D , D (cid:48) ( H ). Clearly cost( φ D , D (cid:48) ,ω ) = (cid:80) ∞ i =0 (cid:15) | η (cid:48) i − η i | = (cid:15) | q ( D (cid:48) ) − q ( D ) | ≤ (cid:15) . For Example 2, there are twoacyclic local alignments φ D,D (cid:48) (cid:62) and φ D,D (cid:48) ⊥ , both havecost = 100 · (cid:15) = (cid:15) . The other conditions in Lemma 1are trivial to check. Thus both mechanisms satisfy (cid:15) -differential privacy by Lemma 1. In this section we propose an adaptive variant of SVTthat can answer more queries than both the originalSVT [19,33] and the SVT with Gap of Wang et al. [45].We explain how to tune its privacy budget allocation.We further show that using other types of random noise,such as exponential and geometric random variables, inplace of the Laplace, makes the free gap informationmore accurate at the same cost to privacy. Finally, wediscuss how the free gap information can be used forimproved utility of data analysis.5.1 Adaptive SVT with GapThe Sparse Vector Technique (SVT) is designed to solvethe following problem in a privacy-preserving way: givena stream of queries (with sensitivity 1), find the first k queries whose answers are larger than a public thresh-old T . This is done by adding noise to the queries andthreshold and finding the first k queries whose noisy an-swers exceed the noisy threshold. Sometimes this pro-cedure creates a feeling of regret – if these k queriesare much larger than the threshold, we could have usedmore noise (hence consumed less privacy budget) toachieve the same result. In this section, we show thatSparse Vector can be made adaptive – so that it will probably use more noise (less privacy budget) for thelarger queries. This means if the first k queries are verylarge, it will still have privacy budget left over to findadditional queries that are likely to be over the thresh-old. Adaptive SVT is shown in Algorithm 1.The main idea behind this algorithm is that, given atarget privacy budget (cid:15) and an integer k , the algorithmwill create three budget parameters: (cid:15) (budget for thethreshold), (cid:15) (baseline budget for each query) and (cid:15) ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 7 Algorithm 1:
Adaptive SVT with Gap. Thehyperparameter θ ∈ (0 ,
1) controls the budgetallocation between threshold and queries. input : q : a list of queries of global sensitivity 1 D : database, (cid:15) : privacy budget, T : threshold k : minimum number of above-thresholdqueries algorithm is able to output function AdaptiveSparse ( q , D , T , k , (cid:15) ) : (cid:15) ← θ(cid:15) ; (cid:15) ← (1 − θ ) (cid:15)/k ; (cid:15) ← (cid:15) / σ ← √ /(cid:15) η ← Lap(1 /(cid:15) ); (cid:101) T ← T + η cost ← (cid:15) foreach i ∈ { , · · · , len( q ) } do ξ i ← Lap(2 /(cid:15) ); ˜ q i ← q i ( D ) + ξ i η i ← Lap(2 /(cid:15) ); ˆ q i ← q i ( D ) + η i if ˜ q i − (cid:101) T ≥ σ then output: ( (cid:62) , ˜ q i − (cid:101) T , bud used = (cid:15) ) cost ← cost + (cid:15) else if ˆ q i − (cid:101) T ≥ then output: ( (cid:62) , ˆ q i − (cid:101) T , bud used = (cid:15) ) cost ← cost + (cid:15) else output: ( ⊥ , bud used = ) if cost > (cid:15) − (cid:15) then break (smaller alternative budget for each query, (cid:15) < (cid:15) ).The privacy budget allocation between threshold andqueries is controlled by a hyperparameter θ ∈ (0 , /(cid:15) ) noise to thethreshold and consumes (cid:15) of the privacy budget. Then,when a query comes in, the algorithm first adds a lotof noise (i.e., Lap(2 /(cid:15) )) to the query. The first “if”branch checks if this value is much larger than the noisythreshold (i.e. checks if the gap is ≥ σ for some σ ). Ifso, then it outputs the following three items: (1) (cid:62) , (2)the noisy gap, and (3) the amount of privacy budgetused for this query (which is (cid:15) ). The use of alignmentswill show that failing this “if” branch consumes no pri-vacy budget. If the first “if” branch fails, then the al-gorithm adds more moderate noise (i.e., Lap(2 /(cid:15) )) tothe query answer. If this noisy value is larger than thenoisy threshold, the algorithm outputs: (1 (cid:48) ) (cid:62) , (2 (cid:48) ) thenoisy gap, and (3 (cid:48) ) the amount of privacy budget con-sumed (i.e., (cid:15) ). If this “if” condition also fails, then thealgorithm outputs: (1 (cid:48)(cid:48) ) ⊥ and (2 (cid:48)(cid:48) ) the privacy budgetconsumed (0 in this case).To summarize, there is a one-time cost for addingnoise to the threshold. Then, for each query, if the topbranch succeeds the privacy budget consumed is (cid:15) , ifthe middle branch succeeds, the privacy cost is (cid:15) , and In our algorithm, we set σ to be the standard deviationof the noise distribution. if the bottom branch succeeds, there is no additionalprivacy cost. These properties can be easily seen byfocusing on the local alignment – if M ( D, H ) producesa certain output, how much does H need to change toget a noise vector H (cid:48) so that M ( D (cid:48) , H (cid:48) ) returns thesame exact output. Local alignment.
To create a local alignment for eachpair D ∼ D (cid:48) , let H = ( η, ξ , η , ξ , η , . . . ) where η is thenoise added to the threshold T , and ξ i (resp. η i ) is thenoise that should be added to the i th query q i in Line 7(resp. Line 8), if execution ever reaches that point. Weview the output ω = ( w , . . . , w s ) as a variable-lengthsequence where each w i is either ⊥ or a nonnegativegap (we omit the (cid:62) as it is redundant), together with atag ∈ { , (cid:15) , (cid:15) } indicating which branch w i is from (andthe privacy budget consumed to output w i ). Let I ω = { i | tag( w i ) = (cid:15) } and J ω = { i | tag( w i ) = (cid:15) } . That is, I ω is the set of indexes where the output is a gap fromthe top branch, and J ω is the set of indexes where theoutput is a gap from the middle branch. For H ∈ H D: ω define φ D , D (cid:48) ,ω ( H ) = H (cid:48) = ( η (cid:48) , ξ (cid:48) , η (cid:48) , ξ (cid:48) , η (cid:48) , . . . ) where η (cid:48) = η + 1 , ( ξ (cid:48) i , η (cid:48) i ) = ( ξ i + 1 + q i − q (cid:48) i , η i ) , i ∈ I ω ( ξ i , η i + 1 + q i − q (cid:48) i ) , i ∈ J ω ( ξ i , η i ) , otherwise (2)In other words, we add 1 to the noise that was addedto the threshold (thus if the noisy q ( D ) failed a specificbranch, the noisy q ( D (cid:48) ) will continue to fail it becauseof the higher noisy threshold). If a noisy q ( D ) succeededin a specific branch, we adjust the query’s noise so thatthe noisy version of q ( D (cid:48) ) will succeed in that samebranch. Lemma 2
Let M be the Adaptive SVT with Gap al-gorithm. For all D ∼ D (cid:48) and ω , the functions φ D , D (cid:48) ,ω defined above are acyclic local alignments for M . Fur-thermore, for every pair D ∼ D (cid:48) , there are countablymany distinct φ D , D (cid:48) ,ω .Proof. Pick an adjacent pair D ∼ D (cid:48) and an ω =( w , . . . , w s ). For a given H = ( η, ξ , η , . . . ) such that M ( D, H ) = ω , let H (cid:48) = ( η (cid:48) , ξ (cid:48) , η (cid:48) , . . . ) = φ D , D (cid:48) ,ω ( H ).Suppose M ( D (cid:48) , H (cid:48) ) = ω (cid:48) = ( w (cid:48) , . . . , w (cid:48) t ). Our goal is toshow ω (cid:48) = ω . Choose an i ≤ min( s, t ). – If i ∈ I ω , then by (2) we have q (cid:48) i + ξ (cid:48) i − ( T + η (cid:48) )= q (cid:48) i + ξ i + 1 + q i − q (cid:48) i − ( T + η + 1)= q i + ξ i − ( T + η ) ≥ σ. Zeyu Ding et al. This means the first “if” branch succeeds in bothexecutions and the gaps are the same. Therefore, w (cid:48) i = w i . – If i ∈ J ω , then by (2) we have q (cid:48) i + ξ (cid:48) i − ( T + η (cid:48) )= q (cid:48) i + ξ i − ( T + η + 1) = q (cid:48) i − ξ i − ( T + η ) ≤ q i + ξ i − ( T + η ) < σ,q (cid:48) i + η (cid:48) i − ( T + η (cid:48) )= q (cid:48) i + η i + 1 + q i − q (cid:48) i − ( T + η + 1)= q i + η i − ( T + η ) ≥ . The first inequality is due to the sensitivity restric-tion: | q i − q (cid:48) i | ≤ ⇒ q (cid:48) i − ≤ q i . These twoequations mean that the first “if” branch fails andthe second “if” branch succeeds in both executions,and the gaps are the same. Hence w (cid:48) i = w i . – If i (cid:54)∈ I ω ∪ J ω , then by a similar argument we have q (cid:48) i + ξ (cid:48) i − ( T + η (cid:48) ) ≤ q i + ξ i − ( T + η ) < σ,q (cid:48) i + η (cid:48) i − ( T + η (cid:48) ) ≤ q i + η i − ( T + η ) < . Hence both executions go to the last “else” branchand w (cid:48) i = ( ⊥ ,
0) = w i .Therefore for all 1 ≤ i ≤ min( s, t ), we have w (cid:48) i = w i .That is, either ω (cid:48) is a prefix of ω , or vice versa. Let q be the vector of queries passed to the algorithm and letlen( q ) be the number of queries it contains (which canbe finite or infinity). By the termination condition ofAlgorithm 1 we have two possibilities. – s = len( q ): in this case there is still enough privacybudget left after answering s − t = len( q ) too because M ( D (cid:48) , H (cid:48) ) will also run through all the queries (itcannot stop until it has exhausted the privacy bud-get or hits the end of the query sequence). – s < len( q ): in this case the privacy budget is ex-hausted after outputting w s and we must also have t = s .Thus t = s and hence ω (cid:48) = ω . The local alignmentsare clearly acyclic (e.g., use the identity permutation).Note that φ D , D (cid:48) ,ω only depends on ω through I ω and J ω (the sets of queries whose noisy values were larger thanthe noisy threshold). There are only countably manypossibilities for I ω and J ω and thus countably manydistinct φ D , D (cid:48) ,ω . Alignment cost and privacy.
Now we establish the align-ment cost and the privacy property of Algorithm 1.
Theorem 2
The Adaptive SVT with Gap satisfies (cid:15) -differential privacy. Proof.
First we bound the cost of the alignment func-tion defined by Equation (2). We use the (cid:15) , (cid:15) , (cid:15) and (cid:15) defined in Algorithm 1. From (2) we havecost( φ D , D (cid:48) ,ω )= (cid:15) | η (cid:48) − η | + ∞ (cid:88) i =1 (cid:16) (cid:15) | ξ (cid:48) i − ξ i | + (cid:15) | η (cid:48) i − η i | (cid:17) = (cid:15) + (cid:88) i ∈I ω (cid:15) | q i − q (cid:48) i | + (cid:88) i ∈J ω (cid:15) | q i − q (cid:48) i |≤ (cid:15) + (cid:15) |I ω | + (cid:15) |J ω | ≤ (cid:15). The first inequality is from the assumption on sensi-tivity: | q i − q (cid:48) i | ≤ | q i − q (cid:48) i | ≤
2. The second in-equality is from loop invariant on Line 17: (cid:15) + (cid:15) |I ω | + (cid:15) |J ω | = cost ≤ (cid:15) − (cid:15) + max( (cid:15) , (cid:15) ) = (cid:15) .Conditions 1 through 3 of Lemma 1 are trivial tocheck, 4 and 5 follow from Lemma 2 and the abovebound on cost. Thus Theorem 2 follows from Lemma1. Algorithm 1 can be easily extended with multipleadditional “if” branches. For simplicity we do not in-clude such variations. In our setting, (cid:15) = (cid:15) / /(cid:15) )in Line 7 and Lap(1 /(cid:15) ) noises in Line 8 instead. Choice of θ . We can optimize the budget allocationbetween threshold noise and query noises by followingthe methodology of [33], which is equivalent to mini-mizing the variance of the gap between a noisy queryand the threshold. If the majority of gaps are expectedto be returned from the top branch, then we optimizeVar(˜ q i − (cid:101) T ) = (cid:15) + (cid:15) = (cid:15) ( θ + k (1 − θ ) ). This varianceattains its minimum value of 2(1 + √ k ) /(cid:15) when θ = 1 / (1 + √ k ). If on the other hand the majorityof gaps are expected to be returned from the middlebranch, then we optimize Var(ˆ q i − (cid:101) T ) = (cid:15) + (cid:15) = (cid:15) ( θ + k (1 − θ ) ). In this case, the minimum value is2(1 + √ k ) /(cid:15) when θ = 1 / (1 + √ k ). If all queriesare monotone, then the optimal variance further re-duces to 2(1 + √ k ) /(cid:15) in the top branch when θ = In the case of monotonic queries, if ∀ i : q i ≥ q (cid:48) i , thenthe alignment changes slightly: we set η (cid:48) = η (the randomvariable added to the threshold) and set the adjustment tonoise in the winning “if” branches to q i − q (cid:48) i instead of 1+ q i − q (cid:48) i (hence cost terms become | q i − q (cid:48) i | instead of | q i − q (cid:48) i | ).If ∀ i : q i ≤ q (cid:48) i then we keep the original alignment but in thecost calculation we note that | q i − q (cid:48) i | ≤ / (1 + √ k ), and 2(1 + √ k ) /(cid:15) in the middle branchwhen θ = 1 / (1 + √ k ).These allocation strategies also extend to SVT withGap (originally proposed in [45]). SVT with Gap canbe obtained by removing the first branch of Algorithm1 (Line 9 through 11) or setting σ = ∞ . For reference,we show its pseudocode below as Algorithm 2. In [45], θ is set to 0.5, which is suboptimal. The optimal valueis θ = 1 / (1 + √ k ). Algorithm 2:
SVT with Gap [45] input : same as Algorithm 1 function GapSparse ( q , D , T , k , (cid:15) ) : (cid:15) ← θ(cid:15) ; (cid:15) ← (1 − θ ) (cid:15)/k ; η ← Lap(1 /(cid:15) ); (cid:101) T ← T + η cost ← (cid:15) foreach i ∈ { , · · · , len( q ) } do η i ← Lap(2 /(cid:15) ); ˜ q i ← q i ( D ) + η i if ˜ q i − (cid:101) T ≥ then output: ( (cid:62) , ˜ q i − (cid:101) T , bud used = (cid:15) ) cost ← cost + (cid:15) else output: ( ⊥ , bud used = ) if cost > (cid:15) − (cid:15) then break Exponential noise.
When using random noise from theexponential distribution, we need to subtract off the ex-pected value of the noise from the queries and threshold.The details are shown in Algorithm 3. Compared withAlgorithm 1, Algorithm 3 makes the following changes: – Line 3: the algorithm stores the expected value ofExp(1 /(cid:15) ), Exp(2 /(cid:15) ), Exp(2 /(cid:15) ) in b , b , b respec-tively. It also changes the value of σ from 2 √ /(cid:15) ,the standard deviation of Lap(2 /(cid:15) ), to 2 /(cid:15) , thestandard deviation of Exp(2 /(cid:15) ). – Lines 4, 7 and 8: change Laplace noise to exponen-tial noise of the same scale, and then subtracts theexpected values of the noise. If all queries are counting queries, we further reduce thenoise to Exp(1 /(cid:15) ) in Line 7 and Exp(1 /(cid:15) ) in Line 8,and set b = 1 /(cid:15), b = 1 /(cid:15) , σ = 1 /(cid:15) in Line 4. Algorithm 3:
Adaptive SVT with Gap withexponential noise input : same as Algorithm 1 function AdaptiveSparse ( q , D , T , k , (cid:15) ) : (cid:15) ← θ(cid:15) ; (cid:15) ← (1 − θ ) (cid:15)/k ; (cid:15) ← (cid:15) / b ← /(cid:15) ; b ← /(cid:15) ; b ← /(cid:15) ; σ ← /(cid:15) η ← Exp(1 /(cid:15) ); (cid:101) T ← T + η − b cost ← (cid:15) foreach i ∈ { , · · · , len( q ) } do ξ i ← Exp(2 /(cid:15) ); ˜ q i ← q i ( D ) + ξ i − b η i ← Exp(2 /(cid:15) ); ˆ q i ← q i ( D ) + η i − b if ˜ q i − (cid:101) T ≥ σ then output: ( (cid:62) , ˜ q i − (cid:101) T , bud used = (cid:15) ) cost ← cost + (cid:15) else if ˆ q i − (cid:101) T ≥ then output: ( (cid:62) , ˆ q i − (cid:101) T , bud used = (cid:15) ) cost ← cost + (cid:15) else output: ( ⊥ , bud used = ) if cost > (cid:15) − (cid:15) then break Geometric noise.
When all queries have integer val-ues (e.g. counting queries), we could utilize geometricnoise to make sure that the gap is also an integer. Touse geometric noise we make the following changes toAlgorithm 3: – Line 3: set b = 1 / (1 − e − (cid:15) ), b = 1 / (1 − e − (cid:15) / ) and b = 1 / (1 − e − (cid:15) / ), which are the expected values ofGeo(1 − e − (cid:15) ), Geo(1 − e − (cid:15) / ) and Geo(1 − e − (cid:15) / )respectively. Set σ = e (cid:15) / / ( e (cid:15) / − − e − (cid:15) / ). – Line 4, 7 and 8: changes Exp(1 /(cid:15) ), Exp(2 /(cid:15) ) andExp(2 /(cid:15) ) noise to Geo(1 − e − (cid:15) ), Geo(1 − e − (cid:15) / )and Geo(1 − e − (cid:15) / ) noise respectively.If all queries are counting queries, we further reduce thenoise to Geo(1 − e − (cid:15) ) in Line 7 and Geo(1 − e − (cid:15) ) inLine 8, and set b = 1 / (1 − e − (cid:15) ) , b = 1 / (1 − e − (cid:15) ) , σ = e (cid:15) / / ( e (cid:15) −
1) in Line 4.
Local alignment and privacy.
The alignment in Equa-tion 2 for the Adaptive SVT with Gap with Laplacenoise also works for both exponential noise and geo-metric noise, because η (cid:48) − η = 1 and ξ (cid:48) i − ξ i , η (cid:48) i − η i ∈{ , q i − q (cid:48) i } . The value 1 + q i − q (cid:48) i is always ≥ q i , q (cid:48) i are integers.Recall that if f ( x ) is the probability density functionof Exp( β ), then ln f ( x ) f ( y ) ≤ β | x − y | . Similarly, if g ( x ) is et al. the probability mass function of Geo( p ), then ln g ( x ) g ( y ) =ln p (1 − p ) x p (1 − p ) y ≤ − ln(1 − p ) | x − y | . Therefore, our choice ofthe parameters ensures that the alignment cost is thesame as that of Laplace noise, which is bounded by (cid:15) .Thus both variants are (cid:15) -differentially private. Choice of θ . As before, we choose the θ that mini-mizes the variance of the gap to make the result mostaccurate. Note that exponential distribution has halfthe variance of the Laplace distribution of the samescale. Thus, when exponential noise is used, the mini-mum variance of the gap is (1 + √ k ) /(cid:15) in the topbranch when θ = 1 / (1 + √ k ), and (1 + √ k ) /(cid:15) in the middle branch when θ = 1 / (1 + √ k ). If allqueries are monotone, then the optimal variance fur-ther reduces to (1 + √ k ) /(cid:15) in the top branch when θ = 1 / (1 + √ k ), and (1 + √ k ) /(cid:15) in the middlebranch when θ = 1 / (1 + √ k ).Since the geometric distribution is the discrete ana-logue of the exponential distribution, the above resultsapply to geometric noise as well. For example, whenall queries are counting queries and geometric noise isused, then Var( ˆ q i − (cid:101) T ) = e (cid:15) ( e (cid:15) − + e (cid:15) ( e (cid:15) − = e θ(cid:15) ( e θ(cid:15) − + e (1 − θ ) (cid:15)/k ( e (1 − θ ) (cid:15)/k − in the middle branch. The variance of thegap, albeit complicated, is a convex function of θ on(0 , θ where the variance is minimum, and foundthat those values are almost the same as those for ex-ponential noise (See Fig. 1). Therefore, we can use thebudget allocation strategy for exponential noise as thestrategy for geometric noise too. k 𝜃 m i n Curve of 𝜃 = + √ k Values of 𝜃 min Fig. 1: The blue dots are values of θ min =argmin( e θ(cid:15) ( e θ(cid:15) − + e (1 − θ ) (cid:15)/k ( e (1 − θ ) (cid:15)/k − ) for k from 1 to 50. Theorange curve is the function θ = 1 / (1 + √ k ). 5.3 Utilizing Gap InformationWhen SVT with Gap or Adaptive SVT with Gap re-turns a gap γ i for a query q i , we can add to it the publicthreshold T . This means γ i + T is an estimate of thevalue of q i ( D ). We can ask two questions: how can weimprove the accuracy of this estimate and how can webe confident that the true answer q i ( D ) is really largerthan the threshold T ? Lower confidence interval.
Recall that the randomnessin the gap in Adaptive SVT with Gap (Algorithm 1)is of the form η i − η where η and η i are independentzero mean Laplace variables with scale 1 /(cid:15) and 1 /(cid:15) ∗ ,where (cid:15) ∗ is either (cid:15) or (cid:15) , depending on the branch.The random variable η i − η has the following lower tailbound: Lemma 3
For any t ≥ we have P ( η i − η ≥ − t ) = (cid:40) − (cid:15) e − (cid:15) ∗ t − (cid:15) ∗ e − (cid:15) t (cid:15) − (cid:15) ∗ ) (cid:15) (cid:54) = (cid:15) ∗ − ( (cid:15) t ) e − (cid:15) t (cid:15) = (cid:15) ∗ For proof see the Appendix. For any confidence level,say 95%, we can use this result to find a number t . such that P (( η i − η ) ≥ − t . ) = .
95. This is a lowerconfidence bound, so that the true value q i ( D ) is ≥ ourestimated value γ i + T minus t . with probability 0 . Improving accuracy.
To improve accuracy, one can splitthe privacy budget (cid:15) in half. The first half (cid:15) (cid:48) ≡ (cid:15)/ (cid:15) (cid:48)(cid:48) ≡ (cid:15)/ k queries, we add Lap( k/(cid:15) (cid:48)(cid:48) )noise to each one). Denote the k selected queries by q , . . . , q k , the noisy gaps by γ , . . . , γ k and the inde-pendent noisy measurements by α , . . . , α k . The noisyestimates can be combined together with the gaps toget improved estimates β i of q i ( D ) in the standard way(inverse-weighting by variance): β i = (cid:18) α i Var( α i ) + γ i + T Var( γ i ) (cid:19) (cid:30) (cid:18) α i ) + 1Var( γ i ) (cid:19) . Note that
Var( β i )Var( α i ) = Var( γ i )Var( α i )+Var( γ i ) < within SVT with Gap is the ratio 1 : √ k . Under this set-ting, we have Var( γ i ) = 8(1+ √ k ) /(cid:15) . Also, we knowVar( α i ) = 8 k /(cid:15) . Therefore, E ( | β i − q i | ) E ( | α i − q i | ) = Var( β i )Var( α i ) = (1 + √ k ) (1 + √ k ) + k . ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 11 Since lim k →∞ (1+ √ k ) (1+ √ k ) + k = , the improvement in ac-curacy approaches 20% as k increases. For monotonicqueries, the optimal budget allocation within SVT withGap is 1 : √ k . Then we have Var( γ i ) = 8(1+ √ k ) /(cid:15) and therefore Var( β i )Var( α i ) = (1+ √ k ) (1+ √ k ) + k which is close to50% when k is large. When the algorithm uses expo-nential noise, the variance of the gap further reducesto Var( γ i ) = 4(1 + √ k ) /(cid:15) and therefore Var( β i )Var( α i ) = (1+ √ k ) (1+ √ k ) +2 k which is close to a 66% reduction of meansquared errors when k is large. Our experiments in Sec-tion 9 confirm this improvement. In this section, we present novel variations of the NoisyMax mechanism [19]. Given a list of queries with sen-sitivity 1, the purpose of Noisy Max is to estimate theidentity (i.e., index) of the largest query. We show that,in addition to releasing this index, it is possible to re-lease a numerical estimate of the gap between the val-ues of the largest and second largest queries. This ex-tra information comes at no additional cost to privacy,meaning that the original Noisy Max mechanism threwaway useful information. This result can be generalizedto the setting in which one wants to estimate the iden-tities of the top k queries - we can release (for free)all of the gaps between each top k query and the nextbest query (i.e., the gap between the best and secondbest queries, the gap between the second and third bestqueries, etc). When a user subsequently asks for a noisyanswer to each of the returned queries, we show how thegap information can be used to reduce squared error byup to 66% (for counting queries).6.1 Noisy Top-K with GapOur proposed Noisy Top-K with Gap mechanism isshown in Algorithm 4 (the function arg max c returnsthe top c items). We can obtain the classical Noisy Maxalgorithm [19] from it by setting k = 1 and throwingaway the gap information (the boxed items on Lines6 and 7). The Noisy Top-K with Gap algorithm takesas input a sequence of n queries q , . . . , q n , each hav-ing sensitivity 1. It adds Laplace noise to each query.It returns the indexes j , . . . , j k of the k queries withthe largest noisy values in descending order. Further-more, for each of these top k queries q j i , it releasesthe noisy gap between the value of q j i and the value ofthe next best query. Our key contribution in this sec-tion is the observation that these gaps can be released for free. That is, the classical Top-K algorithm, whichdoes not release the gaps, satisfies (cid:15) -differential privacy.But, our improved version has exactly the same privacycost yet is strictly better because of the extra infor-mation it can release. We emphasize that keeping the Algorithm 4:
Noisy Top-K with Gap input: q : a list of n queries of global sensitivity 1 D : database, k : (cid:15) : privacy budget function NoisyTopK ( q , D , k , (cid:15) ) : foreach i ∈ { , · · · , n } do η i ← Lap(2 k/(cid:15) ); (cid:101) q i ← q i ( D ) + η i ( j , . . . , j k +1 ) ← arg max k +1 ( (cid:101) q , . . . , (cid:101) q n ) foreach i ∈ { , · · · , k } do g i ← (cid:101) q j i − (cid:101) q j i +1 // i th gap return (( j , g ) , . . . , ( j k , g k )) noisy gaps hidden does not decrease the privacy cost.Furthermore, this algorithm gives estimates of the pair-wise gaps between any pair of the k queries it selects.For example, suppose we are interested in estimatingthe gap between the a th largest and b th largest queries(where a < b ≤ k ). This is equal to (cid:80) b − i = a g i because: (cid:80) b − i = a g i = (cid:80) b − i = a ( (cid:101) q j i − (cid:101) q j i +1 ) = (cid:101) q j a − (cid:101) q j b and hence itsvariance is Var( (cid:101) q j a − (cid:101) q j b ) = 16 k /(cid:15) .The original Noisy Top-K mechanism satisfies (cid:15) -differential privacy. In the special case that all the q i are counting queries then it satisfies (cid:15)/ Local alignment.
To prove the privacy of Algorithm4, we need to create a local alignment function foreach possible pair D ∼ D (cid:48) and output ω . Note thatour mechanism uses precisely n random variables. Let H = ( η , η , . . . ) where η i is the noise that shouldbe added to the i th query. We view the output ω =(( j , g ) , . . . , ( j k , g k )) as k pairs where in the i th pair( j i , g i ), the first component j i is the index of i th largestnoisy query and the second component g i is the gap innoisy value between the i th and ( i + 1) th largest noisyqueries. As in prior work [19], we will base our analy-sis on continuous noise so that the probability of tiesamong the top k + 1 noisy queries is 0. Thus each gapis positive: g i > I ω = { j , . . . , j k } and I cω = { , . . . , n } \ I ω .I.e., I ω is the index set of the k largest noisy queriesselected by the algorithm and I cω is the index set of allunselected queries. For H ∈ H D: ω define φ D , D (cid:48) ,ω ( H ) = et al. H (cid:48) = ( η (cid:48) , η (cid:48) , . . . ) as η (cid:48) i = η i i ∈ I cω η i + q i − q (cid:48) i +max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) i ∈ I ω (3)The idea behind this local alignment is simple: we wantto keep the noise of the losing queries the same (whenthe input is D or its neighbor D (cid:48) ). But, for each of the k selected queries, we want to align its noise to makesure it wins by the same amount when the input is D or its neighbor D (cid:48) . Lemma 4
Let M be the Noisy Top-K with Gap algo-rithm. For all D ∼ D (cid:48) and ω , the functions φ D , D (cid:48) ,ω defined above are acyclic local alignments for M . Fur-thermore, for every pair D ∼ D (cid:48) , there are countablymany distinct φ D , D (cid:48) ,ω .Proof. Given D ∼ D (cid:48) and ω = (( j , g ) , . . . , ( j k , g k )),for any H = ( η , η , . . . ) such that M ( D, H ) = ω ,let H (cid:48) = ( η (cid:48) , η (cid:48) , . . . ) = φ D , D (cid:48) ,ω ( H ). We show that M ( D (cid:48) , H (cid:48) ) = ω . Since φ D , D (cid:48) ,ω is identity on compo-nents i ∈ I cω , we have max l ∈I cω ( q (cid:48) l + η (cid:48) l ) = max l ∈I cω ( q (cid:48) l + η l ).From (3) we have that when i ∈ I ω , η (cid:48) i = η i + q i − q (cid:48) i + max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l )= ⇒ q (cid:48) i + η (cid:48) i − max l ∈I cω ( q (cid:48) l + η l ) = q i + η i − max l ∈I cω ( q l + η l )= ⇒ q (cid:48) i + η (cid:48) i − max l ∈I cω ( q (cid:48) l + η (cid:48) l ) = q i + η i − max l ∈I cω ( q l + η l )So, for the k th selected query,( q (cid:48) j k + η (cid:48) j k ) − max l ∈I cω ( q (cid:48) l + η (cid:48) l )= ( q j k + η j k ) − max l ∈I cω ( q l + η l ) = g k > D (cid:48) the noisy query with index j k is largerthan the best of the unselected noisy queries by thesame margin as it is on D . Furthermore, for all 1 ≤ i
To establish the alignmentcost, we need the following lemma.
Lemma 5
Let ( x , . . . , x m ) , ( x (cid:48) , . . . , x (cid:48) m ) ∈ R m be suchthat ∀ i, | x i − x (cid:48) i | ≤ . Then | max i ( x i ) − max i ( x (cid:48) i ) | ≤ .Proof. Let s be an index that maximizes x i and let t bean index that maximizes x (cid:48) i . Without loss of generality,assume x s ≥ x (cid:48) t . Then x s ≥ x (cid:48) t ≥ x (cid:48) s ≥ x s −
1. Hence | x s − x (cid:48) t | = x s − x (cid:48) t ≤ x s − ( x s −
1) = 1 . Theorem 3
The Noisy Top-K with Gap mechanismsatisfies (cid:15) -differential privacy. If all of the queries arecounting queries, then it satisfies (cid:15)/ -differential pri-vacy.Proof. First we bound the cost of the alignment func-tion defined in (3). Recall that the η i ’s are independentLap(2 k/(cid:15) ) random variables. By Definition 7cost( φ D , D (cid:48) ,ω ) = ∞ (cid:88) i =1 | η (cid:48) i − η i | (cid:15) k = (cid:15) k (cid:88) i ∈I ω (cid:12)(cid:12)(cid:12)(cid:12) q i − q (cid:48) i + max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) (cid:12)(cid:12)(cid:12)(cid:12) . By the global sensitivity assumption we have | q i − q (cid:48) i | ≤
1. Apply Lemma 5 to the vectors ( q l + η l ) l ∈I cω and( q (cid:48) l + η l ) l ∈I cω , we have (cid:12)(cid:12)(cid:12)(cid:12) max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) q i − q (cid:48) i + max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | q i − q (cid:48) i | + (cid:12)(cid:12)(cid:12)(cid:12) max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ . Furthermore, if q is monotonic, then – either ∀ i : q i ≤ q (cid:48) i in which case q i − q (cid:48) i ∈ [ − ,
0] andmax l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) ∈ [0 , – or ∀ i : q i ≥ q (cid:48) i in which case q i − q (cid:48) i ∈ [0 ,
1] andmax l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) ∈ [ − , q i − q (cid:48) i + max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) ∈ [ − ,
1] so | q i − q (cid:48) i +max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) | ≤ ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 13 Therefore,cost( φ D , D (cid:48) ,ω )= (cid:15) k (cid:88) i ∈I ω (cid:12)(cid:12)(cid:12)(cid:12) q i − q (cid:48) i + max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15) k (cid:88) i ∈I ω (cid:15) k (cid:88) i ∈I ω q is monotonic)= (cid:15) k · |I ω | (or (cid:15) k · |I ω | if q is monotonic)= (cid:15) (or (cid:15)/ q is monotonic) . Conditions 1 through 3 of Lemma 1 are trivial to check,4 and 5 follow from Lemma 4 and the above bound oncost. Therefore, Theorem 3 follows from Lemma 1.6.2 Noisy Top-K with Exponential NoiseThe original noisy max algorithm also works with one-sided exponential noise [19] with smaller variance thanthe Laplace noise. In this subsection, we show thatthis result extends to the Noisy Top-K with Gap al-gorithm by simply changing Line 3 of Algorithm 4 to η i ← Exp(2 k/(cid:15) ) and privacy is maintained while thevariance of the gap decreases. However, the proof relieson a different local alignment.
Local alignment.
The alignment used in Section 6.1will not work here because it might set our noise ran-dom variables to negative numbers. Thus we need a newalignment. As before, let H = ( η , η , . . . ) where η i isthe noise that should be added to the i th query. Weview the output ω = (( j , g ) , . . . , ( j k , g k )) as k pairswhere in the i th pair ( j i , g i ), the first component j i isthe index of i th largest noisy query and the second com-ponent g i > i th and ( i + 1) th largest noisy queries.Let I ω = { j , . . . , j k } and I cω = { , . . . , n } \ I ω . I.e., I ω is the index set of the k largest noisy queries selectedby the algorithm and I cω is the index set of all unselectedqueries. For H ∈ H D: ω we will use φ D , D (cid:48) ,ω ( H ) = H (cid:48) =( η (cid:48) , η (cid:48) , . . . ) to refer to the aligned noise. In order todefine the alignment, we need the following quantities: s = argmax l ∈I cω ( q l + η l ) , t = argmax l ∈I cω ( q (cid:48) l + η l ) i ∗ = argmin i ∈I ω (cid:26) q i − q (cid:48) i + max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) (cid:27) = argmin i ∈I ω { q i − q (cid:48) i } (the other terms have no i ) δ ∗ = min i ∈I ω (cid:26) q i − q (cid:48) i + max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) (cid:27) = q i ∗ − q (cid:48) i ∗ + ( q (cid:48) t + η t ) − ( q s + η s )Note that i ∗ ∈ I ω and s, t ∈ I cω . We define the align-ment according to the value of δ ∗ . When δ ∗ ≥
0, weuse the same alignment as in the Laplace version of thealgorithm: η (cid:48) i = (cid:40) η i i ∈ I cω η i + q i − q (cid:48) i + ( q (cid:48) t + η t ) − ( q s + η s ) i ∈ I ω (4)When δ ∗ < η (cid:48) i for some i ∈ I ω . So instead, we take that alignmentand further add the positive quantity − δ ∗ in severalplaces so that overall we are adding nonnegative num-bers to each η i to get η (cid:48) i (this ensures that η (cid:48) i is nonneg-ative for each i ). Thus, when δ ∗ <
0, define η (cid:48) i = η i i ∈ I cω \ { t } η i − δ ∗ i = tη i + q i − q (cid:48) i +( q (cid:48) t + η t ) − ( q s + η s ) − δ ∗ i ∈ I ω = η i i ∈ I cω \ { t } η i − δ ∗ i = tη i + q i − q (cid:48) i − q i ∗ + q (cid:48) i ∗ i ∈ I ω (5) Lemma 6
Let M be the Noisy Top-K with Gap algo-rithm that uses exponential noise. For all D ∼ D (cid:48) and ω , the functions φ D , D (cid:48) ,ω defined above are acyclic localalignments for M . Furthermore, for every pair D ∼ D (cid:48) ,there are countably many distinct φ D , D (cid:48) ,ω .Proof. First we show that ∀ i, η (cid:48) i ≥ η i . Recall that δ ∗ =min i ∈I ω { q i − q (cid:48) i + ( q (cid:48) t + η t ) − ( q s + η s ) } . When δ ∗ ≥ η (cid:48) i − η i = q i − q (cid:48) i + ( q (cid:48) t + η t ) − ( q s + η s ) ≥ δ ∗ ≥ i ∈ I ω . When δ ∗ <
0, we have η (cid:48) t − η t = − δ ∗ > η (cid:48) i − η i = ( q i − q (cid:48) i ) − ( q i ∗ − q (cid:48) i ∗ ) ≥ i ∈ I ω .Therefore, all η (cid:48) i are non-negative.The proof that (4) is an alignment when δ ∗ ≥ δ ∗ <
0, first note that since t = argmax l ∈I cω ( q (cid:48) l + η l ) and − δ ∗ >
0, we have t =argmax l ∈I cω ( q (cid:48) l + η (cid:48) l ). Then from (5), we have that when i ∈ I ω , η (cid:48) i = η i + q i − q (cid:48) i + ( q (cid:48) t + η t ) − ( q s + η s ) − δ ∗ = ⇒ q (cid:48) i + η (cid:48) i − ( q (cid:48) t + ( η t − δ ∗ )) = q i + η i − ( q s + η s )= ⇒ q (cid:48) i + η (cid:48) i − ( q (cid:48) t + η (cid:48) t ) = q i + η i − ( q s + η s )= ⇒ q (cid:48) i + η (cid:48) i − max l ∈I cω ( q (cid:48) l + η (cid:48) l ) = q i + η i − max l ∈I cω ( q l + η l )Thus by a similar argument in Lemma 4, all relativeorders among the k largest noisy queries and their as-sociated gaps are preserved. The facts that φ D , D (cid:48) ,ω isacyclic and there are finitely many φ D , D (cid:48) ,ω are clear. et al. Alignment cost and privacy.
Recall from Table 1 thatif f ( x ) is the density of Exp( β ), then for x, y ≥ f ( x ) f ( y ) = y − xβ ≤ | y − x | β . When δ ∗ ≥
0, the alignmentcost computation is the same as with the Laplace ver-sion of the algorithm. When δ ∗ <
0, we havecost( φ D , D (cid:48) ,ω ) = ∞ (cid:88) i =1 | η (cid:48) i − η i | (cid:15) k = (cid:15) k | δ ∗ | + (cid:15) k (cid:88) i ∈I ω (cid:12)(cid:12) q i − q (cid:48) i − q i ∗ + q (cid:48) i ∗ (cid:12)(cid:12) = (cid:15) k | δ ∗ | + (cid:15) k (cid:88) i ∈I ω \{ i ∗ } (cid:12)(cid:12) q i − q (cid:48) i − q i ∗ + q (cid:48) i ∗ (cid:12)(cid:12) . and note that there are k − (cid:12)(cid:12) q i − q (cid:48) i − q i ∗ + q (cid:48) i ∗ (cid:12)(cid:12) ≤ q is monotone). Also, it is shown in the proof ofTheorem 3 that | δ ∗ | = (cid:12)(cid:12)(cid:12)(cid:12) q i ∗ − q (cid:48) i ∗ + max l ∈I cω ( q (cid:48) l + η l ) − max l ∈I cω ( q l + η l ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ q is monotone).Therefore,cost( φ D , D (cid:48) ,ω )= (cid:15) k | δ ∗ | + (cid:15) k (cid:88) i ∈I ω \{ i ∗ } (cid:12)(cid:12) q i − q (cid:48) i − q i ∗ + q (cid:48) i ∗ (cid:12)(cid:12) (note that there are 1 + ( k −
1) terms above) ≤ (cid:15) k · · k (or (cid:15) k · · k if q is monotonic)= (cid:15) (or (cid:15)/ q is monotonic) . Thus, Algorithm 4 with Exp(2 k/(cid:15) ) noise on Line 3 in-stead of Lap(2 k/(cid:15) ) noise, satisfies (cid:15) -differential privacy.If all of the queries are counting queries, then it satisfies (cid:15)/ k queries. A typicalapproach would be to split the privacy budget (cid:15) in half –use (cid:15)/ k queries usingNoisy Top-K with Gap. The remaining (cid:15)/ k/(cid:15) ) noiseto each query answer). These measurements will havevariance σ = 8 k /(cid:15) . In this section we show how touse the gap information from Noisy Top-K with Gapand postprocessing to improve the accuracy of thesemeasurements. Problem statement.
Let q , . . . , q k be the true answers ofthe top k queries that are selected by Algorithm 4. Let α , . . . , α k be their noisy measurements. Let g , . . . , g k − be the noisy gaps between q , . . . , q k that are obtainedfrom Algorithm 4 for free. Then α i = q i + ξ i whereeach ξ i is a Lap(2 k/(cid:15) ) random variable and g i = q i + η i − q i +1 − η i +1 where each η i is a Lap(4 k/(cid:15) ) randomvariable, or a Lap(2 k/(cid:15) ) random variable if the querylist is monotonic (recall the mechanism was run with aprivacy budget of (cid:15)/ bestlinear unbiased estimate (BLUE) [30] β i of q i in termsof the measurements α i and gap information g i . Theorem 4
With notations as above let q = [ q , . . . , q k ] T , α = [ α , . . . , α k ] T and g = [ g , . . . , g k − ] T . Suppose theratio Var( ξ i ) : Var( η i ) is equal to λ . Then the BLUEof q is β = λ ) k ( X α + Y g ) where X = λk · · ·
11 1 + λk · · · ... ... . . . ... · · · λk k × k Y = k − k − · · · k − k − · · · k − k − · · · ... ... . . . ... k − k − · · · − · · · k · · · k k · · · ... ... . . . k k · · · k k × ( k − For proof, see the Appendix. Even though this is amatrix multiplication, it is easy to see that it translatesinto the following algorithm that is linear in k :1. Compute α = (cid:80) ki =1 α i and p = (cid:80) k − i =1 ( k − i ) g i .2. Set p = 0. For i = 1 , . . . , k − p i = (cid:80) ij =1 g j = p i − + g i .3. For i = 1 , . . . , k , set β i = ( α + λkα i + p − kp i − ) / (1+ λ ) k .Now, each β i is an estimate of the value of q i . Howdoes it compare to the direct measurement α i (whichhas variance σ = 8 k /(cid:15) )? The following result com-pares the expected error of β i (which used the directmeasurements and the gap information) with the ex-pected error of using only the direct measurements (i.e., α i only). Corollary 1
For all i = 1 , . . . , k , we have E ( | β i − q i | ) E ( | α i − q i | ) = 1 + λkk + λk = Var( ξ i ) + k Var( η i ) k (Var( ξ i ) + Var( η i )) . For proof, see the Appendix. In the case of count-ing queries, we have Var( ξ i ) = Var( η i ) = 8 k /(cid:15) and ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 15 thus λ = 1. The error reduction rate is k − k which isclose to 50% when k is large. If we use exponentialnoise instead, i.e., replace η i ← Lap(2 k/(cid:15) ) with η i ← Exp(2 k/(cid:15) ) at Line 3 of Algorithm 4, then Var( η i ) =4 k /(cid:15) = Var( ξ i ) / λ = 1 /
2. In this case, theerror reduction rate is k − k which is close to 66% when k is large. Our experiments in Section 9 confirm thesetheoretical results. In this section, we present two hybrids of SVT with Gapand Noisy Top-K with Gap. Recall that SVT with Gapis an online algorithm that returns the identities andnoisy gaps (with respect to the threshold) of the first k noisy queries it sees that are larger than the noisythreshold. Its benefits are: – Privacy budget is saved if fewer than k queries arereturned. – The queries that are returned come with estimatesof their noisy answers (obtained by adding the pub-lic threshold to the noisy gap).while the drawbacks are: – The returned queries are likely not to resemble the k largest queries (queries that come afterwards areignored, no matter how large their values are).Meanwhile, Noisy Top-K with Gap returns the identi-ties and gaps (with respect to the runner-up query) ofthe top k noisy queries. Its benefits are: – The queries returned are approximately the top k . – The gap tells us how large the queries are comparedto the best non-selected noisy query.while the drawbacks are: – k queries are always returned, even if their valuesare small. – Only gap information is returned (not estimates ofthe query answers).For users who are interested in identifying the top k queries that are likely to be over a threshold, wepresent two hybrid algorithms that try to combine thebenefits of both algorithms while minimizing the draw-backs. Both algorithms take as input a number k , a listof answers to queries having sensitivity 1, and a publicthreshold T . They both return the subset of the top k noisy queries that are larger than the noisy threshold T , hence the privacy cost is dynamic and is smaller iffewer than k queries are returned. The difference is inthe gap information. The first hybrid (Algorithm 5) is more likely to pro-vide accurate identity information than the second hy-brid (Algorithm 6). That is, the queries it returns aremore likely to be the actual queries whose true val-ues are largest (because the first algorithm adds lessnoise to the query answers). However, Algorithm 6 al-ways returns the noisy gap with the threshold (hence,by adding in the public threshold value, this gives anestimate of the query answer). Meanwhile, Algorithm 5only returns the noisy gap with the threshold if fewerthan k queries are returned (if exactly k queries arereturned, it functions like Noisy Top-K with Gap andreturns the gaps with the runner up query).In terms of how they work, Algorithm 5 adds thepublic threshold to the list of queries (it becomes Query0), adds the same noise to them (Lines 2 and 4). In line6, it takes the top k noisy queries ( sorted in decreasingorder) and their gaps with the next best query. It filtersout any that are smaller than the noisy Query 0. For thequeries that didn’t get removed, it returns their iden-tities (recall the threshold is Query 0) and their gapwith the next best query. If the last returned item isQuery 0, this means that the gap information tells ushow much larger the other returned queries are com-pared to the noisy threshold Query 0, and this allowsus to get numerical estimates for those query answersby adding in the public threshold. Algorithm 5:
Hybrid Prioritizing Identity input: q : a list of n queries of global sensitivity 1 D : database, (cid:15) : privacy budget T : public threshold, k : function NoisyTopK ( q , D , T , k , (cid:15) ) : η ← Exp(2 k/(cid:15) ); (cid:101) q ← T + η foreach i ∈ { , · · · , n } do η i ← Exp(2 k/(cid:15) ); (cid:101) q i ← q i ( D ) + η i ( j , . . . , j k +1 ) ← arg max k +1 ( (cid:101) q , (cid:101) q , . . . , (cid:101) q n ) foreach i ∈ { , · · · , k } do g i ← (cid:101) q j i − (cid:101) q j i +1 ; t ← i if j i = 0 then break return (( j , g ) , . . . , ( j t , g t )) Alignment and privacy cost for Algorithm 5.
By replac-ing the index sets I ω in Equations (4) and (5) with I ω = { j , . . . , j t } , the same formula can be used asthe alignment function for Algorithm 5. Note that since |I ω | = t ≤ k , the privacy cost is ( t/k ) (cid:15) . Lemma 7
If Algorithm 5 is run with privacy budget (cid:15) and returns t queries (and their associated gaps), thenthe actual privacy cost is ( t/k ) (cid:15) . et al. The second hybrid (Algorithm 6) is essentially SVTwith Gap applied to the list of queries that is sorted indescending order by their noisy answers. We note thatit adds more noise to each query than Algorithm 5 butalways returns the noisy gap between the noisy queryanswer and the noisy threshold, just like SVT with Gap.
Algorithm 6:
Hybrid Prioritizing Estimates input: same as Algorithm 5 function NoisyTopK ( q , D , T , k , (cid:15) ) : (cid:15) ← θ(cid:15) ; (cid:15) ← (1 − θ ) (cid:15)/k ; b ← /(cid:15) ; b ← /(cid:15) η ← Exp(1 /(cid:15) ); (cid:101) T ← T + η − b foreach i ∈ { , · · · , n } do η i ← Exp(2 /(cid:15) ); (cid:101) q i ← q i ( D ) + η i − b ( j , . . . , j k ) ← arg max k ( (cid:101) q , . . . , (cid:101) q n ) t ← foreach i ∈ { , · · · , k } do if (cid:101) q j i ≥ (cid:101) T then g i ← (cid:101) q j i − (cid:101) T ; t ← i else break return (( j , g ) , . . . , ( j t , g t )) // ∅ if t = 0 Alignment and privacy cost for Algorithm 6.
The align-ment for Algorithm 6 is the same as the one for SVTwith Gap and is hence omitted here. Note that the pri-vacy cost is (cid:15) + t(cid:15) = ( θ + ( t/k )(1 − θ )) (cid:15) where t is thenumber of queries returned. As discussed in Section 5.1,the optimal θ is 1 / (1 + √ k ). Lemma 8
If Algorithm 6 is run with privacy budget (cid:15) and returns t queries (and their associated gaps), thenthe actual privacy cost is ( θ + ( t/k )(1 − θ )) (cid:15) . The Exponential Mechanism [36] was designed to an-swer non-numeric queries in a differentially private way.In this setting, D is the set of possible input databasesand R = { ω , ω , . . . , ω n } is a set of possible outcomes.There is a utility function µ : D×R → R where µ ( D, ω i )gives us the utility of outputting ω i when the true inputdatabase is D . The exponential mechanism randomlyselects an output ω i with probabilities that are definedby the following theorem: Theorem 5 (The Exponential Mechanism [36])
Given (cid:15) > and a utility function µ : D × R → R , themechanism M ( D, µ, (cid:15) ) that outputs ω i ∈ R with proba-bility proportional to exp( (cid:15)µ ( D,ω i )2 ∆ µ ) satisfies (cid:15) -differential privacy where ∆ µ , the sensitivity of µ , is defined as ∆ µ = max D ∼ D (cid:48) max ω i ∈R | µ ( D, ω i ) − µ ( D (cid:48) , ω i ) | . Unlike the Noisy Max and SVT variants, the Expo-nential Mechanism is not an algorithm – it specifies asampling distribution but sampling algorithms have tobe designed on a case-by-case basis (depending on theutility function µ ). Thus, in general, we cannot makeany assumptions about the intermediate state of the al-gorithm. This is an important observation because theintermediate state of Noisy Max and SVT was used tocreate the free gap information.In order to derive the gap algorithm for Exponen-tial Mechanism, we first consider a general-purpose butinefficient implementation that uses intermediate stateof the algorithm, and then we show how to get gapinformation without this intermediate state.8.1 An Inefficient Exponential MechanismThere is a common folklore algorithm in the differen-tial privacy community for sampling from the Expo-nential Mechanism. Its origins are based in the ma-chine learning task known as sampling from a soft-max. The algorithm is called the Gumbel-Max trick [25,35] and is very similar to Noisy Max except that theadded noise comes from the Gumbel(0) distribution.The Gumbel( µ i ) distribution with location parameter µ i has density exp( − ( x − µ i ) − exp( − ( x − µ i ))) over thereal line. The main idea behind the Gumbel-Max trickis that if we have numbers µ , . . . , µ n , add independentGumbel(0) noise to each and select the index of thelargest noisy value, this is the same as sampling the i th item with probability proportional to e µ i . Formally,let Cat (cid:16) exp( µ ) (cid:80) nj =1 exp( µ j ) , . . . , exp( µ n ) (cid:80) nj =1 exp( µ j ) (cid:17) denote the cat-egorical distribution that returns item ω i with proba-bility exp( µ i ) (cid:80) nj =1 exp( µ j ) . The Gumbel-Max theorem providesdistributions for the identity of the noisy maximum andthe value of the noisy maximum: Theorem 6 (The Gumbel-Max Trick [25, 35])
Let G i , . . . , G n be i.i.d. Gumbel(0) random variables and let µ , . . . , µ n be real numbers. Define X i = G i + µ i . Then1. The distribution of arg max i ( X , . . . , X n ) is the sameas Cat (cid:16) exp( µ ) (cid:80) nj =1 exp( µ j ) , . . . , exp( µ n ) (cid:80) nj =1 exp( µ j ) (cid:17) .2. The distribution of max i ( X , . . . , X n ) is the sameas Gumbel(ln (cid:80) ni =1 exp( µ i )) . Therefore, as is noted in folklore, the ExponentialMechanism is equivalent to the following procedure:add i.i.d. Gumbel(0) noise to (cid:15)µ ( D,ω i )2 ∆ µ , select the i forwhich this noisy value is largest, and return ω i . ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 17 Algorithm 7:
Naive Exp. Mech. with Gap input: µ : utility function with sensitivity ∆ µ D : database, (cid:15) : privacy budget function GapExpMech ( D , µ , (cid:15) ) : foreach i ∈ { , · · · , n } do x i ← (cid:15)µ ( D, ω i ) / ∆ µ + Gumbel(0) s, t ← arg max ( x , . . . , x n ) return ω s , x s − x t Although randomness alignment can be used to provethe privacy properties of this algorithm, we will workdirectly with the Gumbel distribution to get a morepowerful result. The proof appears in Appendix A.4.
Theorem 7
Algorithm 7 satisfies (cid:15) -differential privacy.Its output distribution is equivalent to selecting ω s withprobability proportional to exp (cid:0) (cid:15)µ ( D,ω s )2 ∆ µ (cid:1) and then in-dependently sampling the gap from the Logistic distribu-tion (conditional on only sampling non-negative values)with location parameter (cid:15)µ ( D,ω s )2 ∆ µ − ln (cid:80) ω (cid:54) = ω s exp( (cid:15)µ ( D,ω )2 ∆ µ ) .Black-box Exponential Mechanism with Gap. Theorem7 shows how we can improve Algorithm 7. We can firstsample from the traditional Exponential Mechanism asa black box, and then independently sample a numberfrom a logistic distribution until it is nonnegative. Theresulting value is probabilistically equivalent to the gap.The details are shown in Algorithm 8.
We now evaluate the algorithms proposed in this paper.9.1 DatasetsWe use the two real datasets from [33]: BMP-POS,Kosarak and a synthetic dataset T40I10D100K created
Algorithm 8:
Black-box Exp. Mech. with Gap input: same as Algorithm 7 function GapExpMech ( D , µ , (cid:15) ) : ω ← ExpMech ( D, µ, (cid:15) ) while true do x ← Logistic (cid:0) µ ( D,ω )2 ∆ µ − ln (cid:80) ω (cid:48) (cid:54) = ω exp( µ ( D,ω (cid:48) )2 ∆ µ ) (cid:1) if x > then break return ω , x by the generator from the IBM Almaden Quest re-search group. These datasets are collections of trans-actions (each transaction is a set of items). In our ex-periments, the queries correspond to the counts of eachitem (i.e. how many transactions contained item Dataset
BMS-POS 515,597 1,657Kosarak 990,002 41,270T40I10D100K 100,000 942 (cid:15) in half. Sheuses the first half to select k queries using Noisy Top-K with Gap or SVT with Gap (or Adaptive SVT withGap) and then uses the second half of the privacy bud-get to obtain independent noisy measurements of eachselected query.If one were unaware that gap information came forfree, one would just use those noisy measurements asestimates for the query answers. The error of this ap-proach is the gap-free baseline. However, since the gapinformation does come for free, we can use the post-processing described in Sections 6.3 and 5.3 to improveaccuracy (we call this latter approach SVT with Gapwith Measures and Noisy Top-K with Gap with Mea-sures).We first evaluate the percentage reduction of meansquared error (MSE) of the postprocessing approachcompared to the gap-free baseline and compare this im-provement to our theoretical analysis. As discussed inSection 5.3, we set the budget allocation ratio within et al. k % R e d u c t i o n o f M S E Sparse Vector w/ Measures (Laplace)Theoretical Expected Reduction (Laplace)Sparse Vector w/ Measures (Exponential)Theoretical Expected Reduction (Exponential)Sparse Vector w/ Measures (Geometric) (a) SVT with Gap with Measures, BMS-POS. k % R e d u c t i o n o f M S E Noisy Top-K w/ Measures (Laplace)Theoretical Expected Reduction (Laplace)Noisy Top-K w/ Measures (Exponential)Theoretical Expected Reduction (Exponential) (b) Noisy Top-K with Gap with Measures, BMS-POS.
Fig. 2: Percent reduction of Mean Squared Error on monotonic queries, for different k , for SVT with Gapand Noisy Top-K with Gap when half the privacy budget is used for query selection and the other half isused for measurement of their answers. Privacy budget (cid:15) = 0 . 𝜖 % R e d u c t i o n o f M S E Sparse Vector w/ Measures (Laplace)Theoretical Expected Reduction (Laplace)Sparse Vector w/ Measures (Exponential)Theoretical Expected Reduction (Exponential)Sparse Vector w/ Measures (Geometric) (a) SVT with Gap with Measures, kosarak. 𝜖 % R e d u c t i o n o f M S E Noisy Top-K w/ Measures (Laplace)Theoretical Expected Reduction (Laplace)Noisy Top-K w/ Measures (Exponential)Theoretical Expected Reduction (Exponential) (b) Noisy Top-K with Gap with Measures, kosarak.
Fig. 3: Percent reduction of Mean Squared Error on monotonic queries, for different (cid:15) , for SVT with Gapand Noisy Top-K with Gap when half the privacy budget is used for query selection and the other half isused for measurement of their answers. The value of k is set to 10.the SVT with Gap algorithm (i.e., the budget alloca-tion between the threshold and queries) to be 1 : k for monotonic queries and 1 : (2 k ) otherwise – such aratio is recommended in [33] for the original SVT. Thethreshold used for SVT with Gap is randomly pickedfrom the top 2 k to top 8 k in each dataset for each run. All numbers plotted are averaged over 10 ,
000 runs.Due to space constraints, we only show experimentsfor counting queries (which are monotonic).Our theoretical analysis in Sections 5.3 and 6.3 sug-gested that in the case of monotonic queries, the errorreduction rate can reach up to 50% when Laplace noiseis used, and 66% when exponential or geometric noise Selecting thresholds for SVT in experiments is difficult,but we feel this may be fairer than averaging the answer tothe top k th and k + 1 th queries as was done in prior work[33]. is used, as k increases. This is confirmed in Figures 2a,for SVT with Gap and Figures 2b, for our Top-K al-gorithm using the BMS-POS dataset (results for theother datasets are nearly identical). These figures plotthe theoretical and empirical percent reduction of MSEas a function of k and show the power of the free gapinformation.We also generated corresponding plots where k isheld fixed and the total privacy budget (cid:15) is varied. Weonly present the result for the kosarak dataset as resultsfor the other datasets are nearly identical. For SVTwith Gap, Figures 3a confirms that this improvementis stable for different (cid:15) values. For our Top-K algorithm,Figures 3b confirms that this improvement is also stablefor different values of (cid:15) . ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 19 k o f A b o v e - T h r e s h o l d A n s w e r s Sparse VectorAdaptive SVT w/ Gap (Middle)Adaptive SVT w/ Gap (Top) (a) BMS-POS. k o f A b o v e - T h r e s h o l d A n s w e r s Sparse VectorAdaptive SVT w/ Gap (Middle)Adaptive SVT w/ Gap (Top) (b) kosarak. k o f A b o v e - T h r e s h o l d A n s w e r s Sparse VectorAdaptive SVT w/ Gap (Middle)Adaptive SVT w/ Gap (Top) (c) T40I10D100K.
Fig. 4: k ’s for monotonic queries.Privacy budget (cid:15) = 0 . x -axis: k . k P r e c i s i o n a n d F - M e a s u r e Sparse Vector - PrecisionAdaptive SVT w/ Gap - PrecisionSparse Vector - F-MeasureAdaptive SVT w/ Gap - F-Measure (a) BMS-POS. k P r e c i s i o n a n d F - M e a s u r e Sparse Vector - PrecisionAdaptive SVT w/ Gap - PrecisionSparse Vector - F-MeasureAdaptive SVT w/ Gap - F-Measure (b) kosarak. k P r e c i s i o n a n d F - M e a s u r e Sparse Vector - PrecisionAdaptive SVT w/ Gap - PrecisionSparse Vector - F-MeasureAdaptive SVT w/ Gap - F-Measure (c) T40I10D100K.
Fig. 5: Precision and F-Measure of SVT and Adaptive SVT with Gap under different k ’s for monotonicqueries. Privacy budget (cid:15) = 0 . x -axis: k .9.3 Benefits of AdaptivityIn this section we present an evaluation of the budget-saving properties of our novel Adaptive SVT with Gapalgorithm to show that it can answer more queries thanSVT and SVT with Gap at the same privacy cost (or,conversely, answer the same number of queries but withleftover budget that can be used for other purposes).First note that SVT and SVT with Gap both answerexactly the same amount of queries, so we only needto compare Adaptive SVT with Gap to the originalSVT [19,33]. In both algorithms, the budget allocationbetween the threshold noise and query noise is set ac-cording to the ratio 1 : k (i.e., the hyperparameter θ inAdaptive SVT with Gap is set to 1 / (1 + k )), followingthe discussion in Section 5.1. The threshold is randomlypicked from the top 2 k to top 8 k in each dataset andall reported numbers are averaged over 10 ,
000 runs.
Number of queries answered.
We first compare the num-ber of queries answered by each algorithm as the param-eter k is varied from 2 to 24 with a privacy budget of (cid:15) = 0 . k was set to 24). et al. Precision and F-Measure.
Although the adaptive al-gorithm can answer more above-threshold queries thanthe original, one can still ask the question of whetherthe returned queries really are above the threshold.Thus we can look at the precision of the returned results(the fraction of returned queries that are actually abovethe threshold) and the widely used F-Measure (the har-monic mean of precision and recall). One would expectthat the precision of Adaptive SVT with Gap shouldbe less than that of SVT, because the adaptive versioncan use more noise when processing queries. In Figures5a, 5b, and 5c we compare the precision and F-Measureof the two algorithms. Generally we see very little dif-ference in precision. On the other hand, since AdaptiveSVT with Gap answers more queries while maintaininghigh precision, the recall of Adaptive SVT with Gapwould be much larger than SVT, thus leading to theF-Measure being roughly 1.5 times that of SVT. k % R e m a i n i n g P r i v a c y B u d g e t BMS-POST40I10D100Kkosarak
Fig. 6: Remaining privacy budget when Adaptive SVTwith Gap is stopped after answering k queries usingdifferent datasets. Privacy budget (cid:15) = 0 . Remaining Privacy Budget.
If a query is large, Adap-tive SVT with Gap may only need to use a small part ofthe privacy budget to determine that the query is likelyabove the noisy threshold. That is, it may produce anoutput in its top branch, where a lot of noise (hence lessprivacy budget) is used. If we stop Adaptive SVT withGap after k returned queries, it may still have some pri-vacy budget left over (in contrast to standard versionsof Sparse Vector, which use up all of their privacy bud-get). This remaining privacy budget can then be usedfor other data analysis tasks. For all three datasets,Figure 6 shows the percentage of privacy budget thatis left over when Adaptive SVT with Gap is run withparameter k and stopped after k queries are returned.We see that roughly 40% of the privacy budget is left over, confirming that Adaptive SVT with Gap is ableto save a significant amount of privacy budget.
10 General Randomness Alignment and Proofof Lemma 1
In this section, we prove Lemma 1, which was used toestablish the privacy properties of the algorithms weproposed. The proof of the lemma requires a more gen-eral theorem for working with randomness alignmentfunctions. We explicitly list all of the conditions neededfor the sake of reference (many prior works had incor-rect proofs because they did not have such a list to fol-low). In the general setting, the method of randomnessalignment requires the following steps.1. For each pair of adjacent databases D ∼ D (cid:48) and ω ∈ Ω , define a randomness alignment φ D , D (cid:48) orlocal alignment functions φ D , D (cid:48) ,ω : H D: ω → H D (cid:48) : ω (see notation in Table 2). In the case of local align-ments this involves proving that if M ( D, H ) = ω then M ( D (cid:48) , φ D , D (cid:48) ,ω ( H )) = ω .2. Show that φ D , D (cid:48) (or all the φ D , D (cid:48) ,ω ) is one-to-one(it does not need to be onto). That is, if we know D, D (cid:48) , ω and we are given the value φ D , D (cid:48) ( H ) (or φ D , D (cid:48) ,ω ( H )), we can obtain the value H .3. For each pair of adjacent databases D ∼ D (cid:48) , boundthe alignment cost of φ D , D (cid:48) ( φ D , D (cid:48) is either givenor constructed by piecing together the local align-ments). Bounding the alignment cost means the fol-lowing: If f is the density (or probability mass) func-tion of H , find a constant a such that f ( H ) f ( φ D , D (cid:48) ( H )) ≤ a for all H (except a set of measure 0). In the caseof local alignments, one can instead show the fol-lowing. For all ω , and adjacent D ∼ D (cid:48) the ratio f ( H ) f ( φ D , D (cid:48) ,ω ( H )) ≤ a for all H (except on a set of mea-sure 0).4. Bound the change-of-variables cost of φ D , D (cid:48) (onlynecessary when H is not discrete). One must showthat the Jacobian of φ D , D (cid:48) , defined as J φ D , D (cid:48) = ∂ φ D , D (cid:48) ∂H ,exists (i.e. φ D , D (cid:48) is differentiable) and is continu-ous except on a set of measure 0. Furthermore, forall pairs D ∼ D (cid:48) , show the quantity (cid:12)(cid:12)(cid:12) det J φ D , D (cid:48) (cid:12)(cid:12)(cid:12) islower bounded by some constant b >
0. If φ D , D (cid:48) is constructed by piecing together local alignments φ D , D (cid:48) ,ω then this is equivalent to showing the follow-ing (i) (cid:12)(cid:12)(cid:12) det J φ D , D (cid:48) ,ω (cid:12)(cid:12)(cid:12) is lower bounded by some con-stant b > D ∼ D (cid:48) and ω ; and (ii) for each D ∼ D (cid:48) , the set Ω can be partitioned into countablymany disjoint measurable sets Ω = (cid:83) i Ω i such thatwhenever ω and ω ∗ are in the same partition, then ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 21 φ D , D (cid:48) ,ω and φ D , D (cid:48) ,ω * are the same function. Notethat this last condition (ii) is equivalent to requir-ing that the local alignments must be defined with-out using the axiom of choice (since non-measurablesets are not constructible otherwise) and for each D ∼ D (cid:48) , the number of distinct local alignmentsis countable. That is, the set { φ D , D (cid:48) ,ω | ω ∈ Ω } is countable (i.e., for many choices of ω we get thesame exact alignment function). Theorem 8
Let M be a randomized algorithm that ter-minates with probability 1 and suppose the number ofrandom variables used by M can be determined from itsoutput. If, for all pairs of adjacent databases D ∼ D (cid:48) ,there exist randomness alignment functions φ D , D (cid:48) (orlocal alignment functions φ D , D (cid:48) ,ω for all ω ∈ Ω and D ∼ D (cid:48) ) that satisfy conditions 1 though 4 above, then M satisfies ln( a/b ) -differential privacy.Proof. We need to show that for all D ∼ D (cid:48) and E ⊆ Ω , P [ H D:E ] ≤ ( a/b ) P [ H D (cid:48) :E ].First we note that if we have a randomness align-ment φ D , D (cid:48) , we can define corresponding local align-ment functions as follows φ D , D (cid:48) ,ω ( H ) = φ D , D (cid:48) ( H ) (inother words, they are all the same). The conditions onlocal alignments are a superset of the conditions on ran-domness alignments, so for the rest of the proof we workwith the φ D , D (cid:48) ,ω .Let φ , φ , . . . be the distinct local alignment func-tions (there are countably many of them by Condi-tion 4). Let E i = { ω ∈ E | φ D , D (cid:48) ,ω = φ i } . By Condi-tions 1 and 2 we have that for each ω ∈ E i , φ i isone-to-one on H D: ω and φ i ( H D: ω ) ⊆ H D (cid:48) : ω . Note that H D:E i = ∪ ω ∈ E i H D: ω and H D (cid:48) :E i = ∪ ω ∈ E i H D (cid:48) : ω . Fur-thermore, the sets H D: ω are pairwise disjoint for dif-ferent ω and the sets H D (cid:48) : ω are pairwise disjoint fordifferent ω . It follows that φ i is one-to-one on H D:E i and φ i ( H D:E i ) ⊆ H D (cid:48) :E i . Thus for any H (cid:48) ∈ φ i ( H D:E i )there exists H ∈ H D:E i such that H = φ − i ( H (cid:48) ). ByConditions 3 and 4, we have f ( H ) f ( φ i ( H )) = f ( φ − i ( H (cid:48) )) f ( H (cid:48) ) ≤ a for all H ∈ H D:E i , and | det J φ i | ≥ b (except on a set ofmeasure 0). Then the following is true: P [ H D:E i ] = (cid:90) H D:Ei f ( H ) dH = (cid:90) φ i ( H D:Ei ) f ( φ − i ( H (cid:48) )) 1 | det J φ i | dH (cid:48) ≤ (cid:90) φ i ( H D:Ei ) af ( H (cid:48) ) 1 b dH (cid:48) = ab (cid:90) φ i ( H D:Ei ) f ( H (cid:48) ) dH (cid:48) ≤ ab (cid:90) H D (cid:48) :Ei f ( H (cid:48) ) dH (cid:48) = ab P [ H D (cid:48) :E i ] . The second equation is the change of variables formulain calculus. The last inequality follows from the contain-ment φ i ( H D:E i ) ⊆ H D (cid:48) :E i and the fact that the density f is nonnegative. In the case that H is discrete, simplyreplace the density f with a probability mass function,change the integral into a summation, ignore the Jaco-bian term and set b = 1. Finally, since E = ∪ i E i and E i ∩ E j = ∅ for i (cid:54) = j , we conclude that P [ H D:E ] = (cid:88) i P [ H D:E i ] ≤ ab (cid:88) i P [ H D (cid:48) :E i ] = ab P [ H D (cid:48) :E ] . We now present the proof of
Lemma 1.
Proof.
Let φ D , D (cid:48) ,ω ( H ) = H (cid:48) = ( η (cid:48) , η (cid:48) , . . . ). By acyclic-ity there is some permutation π under which η π (1) = η (cid:48) π (1) − c where c is some constant depending on D ∼ D (cid:48) and ω . Thus η π (1) is uniquely determined by H (cid:48) . Now(as an induction hypothesis) assume η π (1) , . . . , η π ( j − are uniquely determined by H (cid:48) for some j >
1, then η π ( j ) = η (cid:48) π ( j ) − ψ ( j ) D,D (cid:48) ,ω ( η π (1) , . . . , η π ( j − ), so η π ( j ) isalso uniquely determined by H (cid:48) . Thus by strong in-duction H is uniquely determined by H (cid:48) , i.e., φ D , D (cid:48) ,ω is one-to-one. It is easy to see that with this order-ing, J φ D , D (cid:48) ,ω is an upper triangular matrix with 1’s onthe diagonal. Since permuting variables doesn’t change (cid:12)(cid:12)(cid:12) det J φ D , D (cid:48) ,ω (cid:12)(cid:12)(cid:12) , we have (cid:12)(cid:12)(cid:12) det J φ D , D (cid:48) ,ω (cid:12)(cid:12)(cid:12) = 1 since that isthe determinant of upper triangular matrices. Further-more, (recalling the definition of the cost of φ D , D (cid:48) ,ω ),clearlyln f ( H ) f ( φ ω ( H )) = (cid:88) i ln f i ( η i ) f i ( η (cid:48) i ) ≤ (cid:88) i c i | η i − η (cid:48) i | ≤ (cid:15) The first inequality follows from Condition 3 of Lemma1 and the second from Condition 4.
11 Conclusions and Future Work
In this paper we introduced variations of SVT, NoisyMax, and Exponential Mechanism that provide addi-tional noisy gap information for free (without affect-ing the privacy cost). We also presented applicationsof how to use the gap information. Future work in-cludes applying this gap information in larger differ-entially private algorithms to increase the accuracy ofprivacy-preserving data analysis.
Acknowledgements
This work was supported by NSF AwardsCNS-1702760 and CNS-1931686.2 Zeyu Ding et al. References
1. Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B.,Mironov, I., Talwar, K., Zhang, L.: Deep learning withdifferential privacy. In: Proceedings of the 2016 ACMSIGSAC Conference on Computer and CommunicationsSecurity, pp. 308–318. ACM (2016)2. Abowd, J.M.: The us census bureau adopts differentialprivacy. In: Proceedings of the 24th ACM SIGKDD In-ternational Conference on Knowledge Discovery & DataMining, pp. 2867–2867. ACM (2018)3. Albarghouthi, A., Hsu, J.: Synthesizing coupling proofsof differential privacy. Proceedings of the ACM on Pro-gramming Languages (POPL), 58 (2017)4. Barthe, G., Gaboardi, M., Gregoire, B., Hsu, J., Strub,P.Y.: Proving differential privacy via probabilistic cou-plings. In: IEEE Symposium on Logic in Computer Sci-ence (LICS) (2016)5. Beimel, A., Nissim, K., Stemmer, U.: Private learningand sanitization: Pure vs. approximate differential pri-vacy. Theory of Computing (1), 1–61 (2016)6. Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Dis-covering frequent patterns in sensitive data. In: Proceed-ings of the 16th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (2010)7. Bittau, A., Erlingsson, U., Maniatis, P., Mironov, I.,Raghunathan, A., Lie, D., Rudominer, M., Kode, U.,Tinnes, J., Seefeld, B.: Prochlo: Strong privacy for an-alytics in the crowd. In: Proceedings of the 26th Sympo-sium on Operating Systems Principles, SOSP ’17 (2017)8. Bun, M., Steinke, T.: Concentrated differential privacy:Simplifications, extensions, and lower bounds. In: Pro-ceedings of the 14th International Conference on Theoryof Cryptography - Volume 9985 (2016)9. Bureau, U.S.C.: On the map: Longitudinal employer-household dynamics. https://lehd.ces.census.gov/applications/help/onthemap.html
10. Chaudhuri, K., Hsu, D., Song, S.: The large margin mech-anism for differentially private maximization. In: Pro-ceedings of the 27th International Conference on NeuralInformation Processing Systems - Volume 1 (2014)11. Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differen-tially private empirical risk minimization. Journal of Ma-chine Learning Research (Mar), 1069–1109 (2011)12. Chen, Y., Machanavajjhala, A., Reiter, J.P., Barrien-tos, A.F.: Differentially private regression diagnostics.In: IEEE 16th International Conference on Data Mining(ICDM) (2016)13. Ding, B., Kulkarni, J., Yekhanin, S.: Collecting teleme-try data privately. In: Advances in Neural InformationProcessing Systems (NIPS) (2017)14. Ding, Z., Wang, Y., Zhang, D., Kifer, D.: Free gap in-formation from the differentially private sparse vectorand noisy max mechanisms. Proc. VLDB Endow. (3),293–306 (2019)15. Dwork, C.: Differential privacy. In: Proceedings of the33rd International Conference on Automata, Languagesand Programming - Volume Part II, ICALP’06, pp. 1–12.Springer-Verlag, Berlin, Heidelberg (2006)16. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I.,Naor, M.: Our data, ourselves: Privacy via distributednoise generation. In: Annual International Conferenceon the Theory and Applications of Cryptographic Tech-niques, pp. 486–503. Springer (2006) 17. Dwork, C., Lei, J.: Differential privacy and robust statis-tics. In: Proceedings of the forty-first annual ACM sym-posium on Theory of computing, pp. 371–380. ACM(2009)18. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrat-ing noise to sensitivity in private data analysis. In: The-ory of cryptography conference, pp. 265–284. Springer(2006)19. Dwork, C., Roth, A.: The algorithmic foundations of dif-ferential privacy. Foundations and Trends in TheoreticalComputer Science (3–4), 211–407 (2014)20. Erlingsson, ´U., Feldman, V., Mironov, I., Raghunathan,A., Talwar, K., Thakurta, A.: Amplification by shuffling:From local to central differential privacy via anonymity.In: Proceedings of the Thirtieth Annual ACM-SIAMSymposium on Discrete Algorithms, SODA 2019, SanDiego, California, USA, January 6-9, 2019 (2019)21. Erlingsson, ´U., Pihur, V., Korolova, A.: Rappor: Ran-domized aggregatable privacy-preserving ordinal re-sponse. In: Proceedings of the 2014 ACM SIGSAC con-ference on computer and communications security, pp.1054–1067. ACM (2014)22. Fanaeepour, M., Rubinstein, B.I.P.: Histogramming pri-vately ever after: Differentially-private data-dependenterror bound optimisation. In: Proceedings of the 34thInternational Conference on Data Engineering, ICDE.IEEE (2018)23. Geng, Q., Viswanath, P.: The optimal mechanism in dif-ferential privacy. In: 2014 IEEE International Symposiumon Information Theory (2014)24. Ghosh, A., Roughgarden, T., Sundararajan, M.: Univer-sally utility-maximizing privacy mechanisms. In: STOC,pp. 351–360 (2009)25. Gumbel, E.: Statistical Theory of Extreme Values andSome Practical Applications: A Series of Lectures. Ap-plied mathematics series. U.S. Government Printing Of-fice (1954)26. Haney, S., Machanavajjhala, A., Abowd, J.M., Graham,M., Kutzbach, M., Vilhuber, L.: Utility cost of formal pri-vacy for releasing national employer-employee statistics.In: Proceedings of the 2017 ACM International Confer-ence on Management of Data, SIGMOD ’17 (2017)27. Hardt, M., Ligett, K., McSherry, F.: A simple and prac-tical algorithm for differentially private data release. In:NIPS (2012)28. Johnson, N., Near, J.P., Song, D.: Towards practical dif-ferential privacy for sql queries. Proc. VLDB Endow. (5) (2018)29. Kotsogiannis, I., Machanavajjhala, A., Hay, M., Miklau,G.: Pythia: Data dependent differentially private algo-rithm selection. In: Proceedings of the 2017 ACM Inter-national Conference on Management of Data, SIGMOD’17 (2017)30. Lehmann, E., Casella, G.: Theory of Point Estimation.Springer Verlag (1998)31. Ligett, K., Neel, S., Roth, A., Waggoner, B., Wu, S.Z.:Accuracy first: Selecting a differential privacy level foraccuracy constrained ERM. In: NIPS (2017)32. Liu, J., Talwar, K.: Private selection from private candi-dates. arXiv preprint arXiv:1811.07971 (2018)33. Lyu, M., Su, D., Li, N.: Understanding the sparse vec-tor technique for differential privacy. Proceedings of theVLDB Endowment (6), 637–648 (2017)34. Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J.,Vilhuber, L.: Privacy: From theory to practice on themap. In: Proceedings of the IEEE International Confer-ence on Data Engineering (ICDE), pp. 277–286 (2008)ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 2335. Maddison, C.J., Tarlow, D., Minka, T.: A ∗ sampling. In:Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence,K.Q. Weinberger (eds.) Advances in Neural Informa-tion Processing Systems 27, pp. 3086–3094. Curran As-sociates, Inc. (2014)36. McSherry, F., Talwar, K.: Mechanism design via differ-ential privacy. In: Proceedings of the 48th Annual IEEESymposium on Foundations of Computer Science, pp. 94–103 (2007)37. McSherry, F.D.: Privacy integrated queries: An exten-sible platform for privacy-preserving data analysis. In:Proceedings of the 2009 ACM SIGMOD InternationalConference on Management of Data, pp. 19–30 (2009)38. Mironov, I.: R´enyi differential privacy. In: 30th IEEEComputer Security Foundations Symposium, CSF (2017)39. Nocedal, J., Wright, S.J.: Numerical Optimization, sec-ond edn. Springer, New York, NY, USA (2006)40. Papernot, N., Song, S., Mironov, I., Raghunathan, A.,Talwar, K., ´Ulfar Erlingsson: Scalable private learningwith pate. In: International Conference on Learning Rep-resentations (ICLR) (2018)41. Raskhodnikova, S., Smith, A.D.: Lipschitz extensions fornode-private graph statistics and the generalized expo-nential mechanism. In: FOCS, pp. 495–504. IEEE Com-puter Society (2016)42. Tang, J., Korolova, A., Bai, X., Wang, X., Wang, X.:Privacy loss in apple’s implementation of differential pri-vacy. In: 3rd Workshop on the Theory and Practice ofDifferential Privacy at CCS (2017)43. Team, A.D.P.: Learning with privacy at scale. AppleMachine Learning Journal (8) (2017)44. Thakurta, A.G., Smith, A.: Differentially private featureselection via stability arguments, and the robustness ofthe lasso. In: Proceedings of the 26th Annual Conferenceon Learning Theory (2013)45. Wang, Y., Ding, Z., Wang, G., Kifer, D., Zhang, D.: Prov-ing differential privacy with shadow execution. In: Pro-ceedings of the 40th ACM SIGPLAN Conference on Pro-gramming Language Design and Implementation, PLDI2019, pp. 655–669. ACM, New York, NY, USA (2019)46. Zhang, D., Kifer, D.: Lightdp: Towards automating differ-ential privacy proofs. In: ACM Symposium on Principlesof Programming Languages (POPL), pp. 888–901 (2017)47. Zhang, D., McKenna, R., Kotsogiannis, I., Hay, M.,Machanavajjhala, A., Miklau, G.: Ektelo: A frameworkfor defining differentially-private computations. In: Pro-ceedings of the 2018 International Conference on Man-agement of Data, SIGMOD ’18 (2018) A Proofs
A.1 Proof of Theorem 4 (BLUE)
Proof.
Let q , . . . , q k be the true answers to the k queries se-lected by Noisy-Top-K-with-Gap algorithm. Let α i be the es-timate of q i using Laplace mechanism, and g i be the estimateof the gap between q i and q i +1 from Noisy-Top-K-with-Gap.Recall that α i = q i + ξ i and g i = q i + η i − q i +1 − η i +1 where ξ i and η i are independent Laplacian random variables.Assume without loss of generality that Var( ξ i ) = σ andVar( η i ) = λσ . Write in vector notation q = q ... q k , ξ = ξ ... ξ k , η = η ... η k , α = α ... α k , g = g ... g k − , then α = q + ξ and g = N ( q + η ) where N = − − ( k − × k . Our goal is then to find the best linear unbiased estimate (BLUE) β of q in terms of α and g . In other words, we needto find a k × k matrix X and a k × ( k −
1) matrix Y such that β = X α + Y g (6)with E ( (cid:107) β − q (cid:107) ) as small as possible. Unbiasedness impliesthat ∀ q , E ( β ) = X q + Y N q = q . Therefore X + Y N = I k and thus X = I k − Y N. (7)Plugging this into (6), we have β = ( I k − Y N ) α + Y g = α − Y ( N α − g ). Recall that α = q + ξ and g = N ( q + η ), wehave N α − g = N ( q + ξ − q − η ) = N ( ξ − η ). Thus β = α − Y N ( ξ − η ) . (8)Write θ = N ( ξ − η ), then we have β − q = α − q − Y θ = ξ − Y θ . Therefore, finding the BLUE is equivalent to solvingthe optimization problem Y = arg min Φ where Φ = E ( (cid:107) ξ − Y θ (cid:107) ) = E (( ξ − Y θ ) T ( ξ − Y θ ))= E ( ξ T ξ − ξ T Y θ − θ T Y T ξ + θ T Y T Y θ )Taking the partial derivatives of Φ w.r.t Y , we have ∂Φ∂Y = E ( − ξθ T − ξθ T + Y ( θθ T + θθ T ))By setting ∂Φ∂Y = 0 we have Y E ( θθ T ) = E ( ξθ T ) thus Y = E ( ξθ T ) E ( θθ T ) − . (9)Recall that ( ξθ T ) ij = ξ i ( ξ j − ξ j +1 − η j + η j +1 ), we have E ( ξθ T ) ij = E ( ξ i ) = Var( ξ i ) = σ i = j − E ( ξ i ) = − Var( ξ i ) = − σ i = j + 10 otherwiseHence E ( ξθ T ) = σ − − k × ( k − = σ N T . Similarly, we have( θθ T ) ij = ( ξ i − ξ i +1 − η i + η i +1 )( ξ j − ξ j +1 − η j + η j +1 )= ξ i ξ j + ξ i +1 ξ j +1 − ξ i ξ j +1 − ξ i +1 ξ j + η i η j + η i +1 η j +1 − η i η j +1 − η i +1 η j − ( ξ i − ξ i +1 )( η j − η j +1 ) − ( η i − η i +1 )( ξ j − ξ j +1 )Thus E ( θθ T ) ij = E ( ξ i + ξ i +1 + η i + η i +1 ) = 2(1+ λ ) σ i = jE ( − ξ i − η i ) = − (1+ λ ) σ i = j +1 E ( − ξ j − η j ) = − (1+ λ ) σ i = j −
10 otherwise4 Zeyu Ding et al.Hence E ( θθ T ) = (1+ λ ) σ − − −
1. .. ... ... − − − ( k − × ( k − . It can be directly computed that E ( θθ T ) − is a symmetricmatrix whose lower trianguilar part is1 k (1+ λ ) σ ( k − · · · · · · · · · · · · · ( k − · k − · · · · · · · · · · ( k − · k − · k − · · · · · · · ... ... ... ... ...1 · · · · · · · ( k − i.e., E ( θθ T ) − ij = E ( θθ T ) − ji = k (1+ λ ) σ · ( k − i ) · j for all1 ≤ i ≤ j ≤ k −
1. Therefore, Y = E ( ξθ T ) E ( θθ T ) − =1 k (1+ λ ) k − k − · · · k − k − · · · k − k − · · · k − k − · · · − · · · k · · · k k · · · k k · · · k k × ( k − Hence X = I k − Y N = 1 k (1+ λ ) kλ · · ·
11 1+ kλ · · · · · · kλ k × k . A.2 Proof of Corollary 1
Recall that α i = q i + ξ i and g i = q i + η i − q i +1 − η i +1 where ξ i and η i are independent Laplacian random variables.Assume without loss of generality that Var( ξ i ) = σ andVar( η i ) = λσ as before. From the matrices X and Y inTheorem 4 we have that β i = x i + y i k (1+ λ ) where x i = α + · · · + (1 + kλ ) α i + · · · + α k = ( q + ξ ) + · · · + (1 + kλ )( q i + ξ i ) + · · · + ( q k + ξ k )and y i = − g − g − · · · − ( i − g i − + ( k − i ) g i + . . . + 2 g k − + g k − = − ( q + η ) − ( q + η ) − · · · − ( q i − + η i − )+ ( k − q i + η i ) − ( q i +1 + η i +1 ) − · · · − ( q k + η k ) . ThereforeVar( x i ) = σ + · · · + (1 + kλ ) σ + · · · + σ = ( k λ + 2 kλ + k ) σ Var( y i ) = λσ + · · · + ( k − λσ + · · · + λσ = ( k − k ) λσ and thusVar( β i ) = Var( x i ) + Var( y i ) k (1 + λ ) = 1 + kλk + kλ σ . Since Var( α i ) = Var( ξ i ) = σ , we haveVar( β i )Var( α i ) = 1 + kλk + kλ . A.3 Proof of Lemma 3
The density function of η i − η is f η i − η ( z ) = (cid:82) ∞−∞ f η i ( x ) f η ( x − z ) dx = (cid:15) (cid:15) ∗ (cid:82) ∞−∞ e − (cid:15) ∗ | x | e − (cid:15) | x − z | dx. First consider the case (cid:15) (cid:54) = (cid:15) ∗ . When z ≥
0, we have f η i − η ( z ) = (cid:15) (cid:15) ∗ (cid:90) ∞−∞ e − (cid:15) ∗ | x | e − (cid:15) | x − z | dx = (cid:15) (cid:15) ∗ (cid:16) (cid:90) −∞ e (cid:15) ∗ x e (cid:15) ( x − z ) dx + (cid:90) z e − (cid:15) ∗ x e (cid:15) ( x − z ) dx + (cid:90) ∞ z e − (cid:15) ∗ x e − (cid:15) ( x − z ) dx (cid:17) = (cid:15) (cid:15) ∗ (cid:16) e − (cid:15) z (cid:15) + (cid:15) ∗ + e − (cid:15) ∗ z − e − (cid:15) z (cid:15) − (cid:15) ∗ + e − (cid:15) ∗ z (cid:15) + (cid:15) ∗ (cid:17) = (cid:15) (cid:15) ∗ ( (cid:15) e − (cid:15) ∗ z − (cid:15) ∗ e − (cid:15) z )2( (cid:15) − (cid:15) ∗ )Thus by symmetry we have for all z ∈ R f η i − η ( z ) = (cid:15) (cid:15) ∗ ( (cid:15) e − (cid:15) ∗ | z | − (cid:15) ∗ e − (cid:15) | z | )2( (cid:15) − (cid:15) ∗ )and P ( η i − η ≥ − t ) = (cid:90) ∞− t f η i − η ( z ) dz = (cid:90) − t f η i − η ( z ) dz + 12= 1 − (cid:15) e − (cid:15) ∗ t − (cid:15) ∗ e − (cid:15) t (cid:15) − (cid:15) ∗ ) . Now if (cid:15) = (cid:15) ∗ , then by similar computations we have f η i − η ( z ) = ( (cid:15) (cid:15) | z | e − (cid:15) | z | and P ( η i − η ≥ − t ) = 1 − ( 2 + (cid:15) t e − (cid:15) t . A.4 Proofs in Section 8 (Exp. Mech. with Gap)
We first need the following results.
Lemma 9
Let (cid:15) > . Let µ : D×R → R be a utility functionof sensitivity ∆ µ . Define ν : D → R and its sensitivity ∆ ν as ν ( D ) = ln (cid:88) ω ∈R e (cid:15)µ ( D,ω )2 ∆µ , ∆ ν = max D ∼ D (cid:48) | ν ( D ) − ν ( D (cid:48) ) | . Then ∆ ν , the sensitivity of ν , is at most (cid:15) .Proof of Lemma 9. From the definition of ν we have | ν ( D ) − ν ( D (cid:48) ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln (cid:88) ω ∈R e (cid:15)µ ( D,ω )2 ∆µ − ln (cid:88) ω ∈R e (cid:15)µ ( D (cid:48) ,ω )2 ∆µ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln (cid:0) (cid:88) ω ∈R e (cid:15)µ ( D,ω )2 ∆µ (cid:1) / (cid:0) (cid:88) ω ∈R e (cid:15)µ ( D (cid:48) ,ω )2 ∆µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ree Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms 25By definition of sensitivity, we have µ ( D (cid:48) , ω ) − ∆ µ ≤ µ ( D, ω ) ≤ µ ( D (cid:48) , ω ) + ∆ µ , and therefore e − (cid:15) (cid:88) ω ∈R e (cid:15)µ ( D (cid:48) ,ω )2 ∆µ ≤ (cid:88) ω ∈R e (cid:15)µ ( D,ω )2 ∆µ ≤ e (cid:15) (cid:88) ω ∈R e (cid:15)µ ( D (cid:48) ,ω )2 ∆µ Thus | ν ( D ) − ν ( D (cid:48) ) | ≤ (cid:15) , and hence ∆ ν ≤ (cid:15) . Lemma 10
Let f ( x ; µ ) = e − ( x − µ ) (1+ e − ( x − µ ) ) be the density of thelogistic distribution, then (cid:12)(cid:12)(cid:12) ln f ( x ; µ ) f ( x ; µ (cid:48) ) (cid:12)(cid:12)(cid:12) ≤ | µ − µ (cid:48) | . Proof of Lemma 10.
Note that (cid:12)(cid:12)(cid:12) ln f ( x ; µ ) f ( x ; µ (cid:48) ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ln f ( x ; µ (cid:48) ) f ( x ; µ ) (cid:12)(cid:12)(cid:12) sowithout loss of generality, we can assume that µ ≥ µ (cid:48) (i.e.,the parameter in the numerator is ≥ the parameter in thedenominator). From the formula of f we have f ( x ; µ ) f ( x ; µ (cid:48) ) = e µ − µ (cid:48) · (cid:32) e − x e µ (cid:48) e − x e µ (cid:33) It is easy to see that e µ ≥ e µ (cid:48) = ⇒ e − x e µ (cid:48) e − x e µ ≤ . Also, 1 + e − x e µ (cid:48) e − x e µ = e µ (cid:48) − µ ( e µ − µ (cid:48) + e − x e µ )1 + e − x e µ ≥ e µ (cid:48) − µ (1 + e − x e µ )1 + e − x e µ = e µ (cid:48) − µ . Therefore, e µ (cid:48) − µ = e µ − µ (cid:48) · ( e µ (cid:48) − µ ) ≤ f ( x ; µ ) f ( x ; µ (cid:48) ) ≤ e µ − µ (cid:48) . Thus (cid:12)(cid:12)(cid:12) ln f ( x ; µ ) f ( x ; µ (cid:48) ) (cid:12)(cid:12)(cid:12) ≤ | µ − µ (cid:48) | . Theorem 7
Algorithm 7 satisfies (cid:15) -differential privacy. Itsoutput distribution is equivalent to selecting ω s with proba-bility proportional to exp (cid:0) (cid:15)µ ( D,ω s )2 ∆ µ (cid:1) and then independentlysampling the gap from the Logistic distribution (conditionalon only sampling non-negative values) with location param-eter (cid:15)µ ( D,ω s )2 ∆ µ − ln (cid:80) ω (cid:54) = ω s exp( (cid:15)µ ( D,ω )2 ∆ µ ) .Proof of Theorem 7. For ω i ∈ R , let µ i = (cid:15)µ ( D,ω i )2 ∆ µ and µ (cid:48) i = (cid:15)µ ( D (cid:48) ,ω i )2 ∆ µ . Let X i ∼ Gumbel( µ i ) and X (cid:48) i ∼ Gumbel( µ (cid:48) i ).We consider the probability of outputting the selected ω s with gap γ ≥ D is the input database: P ( ω s is chosen with gap ≥ γ | D )= (cid:90) R exp( − ( z + γ − µ s ) − e − ( z + γ − µ s ) ) (cid:89) i (cid:54) = s P ( X i ≤ z ) dz = (cid:90) R exp( − ( z + γ − µ s ) − e − ( z + γ − µ s ) ) (cid:89) i (cid:54) = s e − e − ( z − µi ) dz = (cid:90) R e µ s − γ exp( − z − e µ s − γ e − z ) (cid:89) i (cid:54) = s exp( − e µ i e − z ) dz = (cid:90) R e µ s − γ exp( − z − e µ s − γ e − z ) exp( − e µ ∗ e − z ) dz (where µ ∗ = ln( (cid:88) i (cid:54) = s e µ i )) = (cid:90) R e µ s − γ exp( − z − ( e µ s − γ + e µ ∗ ) e − z ) dz = e µ s − γ e µ s − γ + e µ ∗ exp( − ( e µ s − γ + e µ ∗ ) e − z ) (cid:12)(cid:12)(cid:12) + ∞−∞ = e µ s − γ e µ s − γ + e µ ∗ = 11 + e − ( µ s − γ − µ ∗ ) and so P ( ω s is chosen with gap ∈ [0 , γ ] | D )= P ( ω s is chosen | D ) − P ( ω s is chosen with gap ≥ γ | D )= e µ s e µ s + e µ ∗ −
11 + e − ( µ s − γ − µ ∗ ) = 11 + e − ( µ s − µ ∗ ) −
11 + e − ( µ s − γ − µ ∗ ) Taking derivatives with respect to γ , we get the probabil-ity density f ( ω s , γ | D ) of ω s being chosen with gap equal to γ : f ( ω s , γ | D ) = ddγ (cid:18)
11 + e − ( µ s − µ ∗ ) −
11 + e − ( µ s − γ − µ ∗ ) (cid:19) = e − ( µ s − γ − µ ∗ ) (1 + e − ( µ s − γ − µ ∗ ) ) [ γ ≥ = e ( µ s − γ − µ ∗ ) ( e ( µ s − γ − µ ∗ ) + 1) [ γ ≥ = e − ( γ − ( µ s − µ ∗ )) ( e − ( γ − ( µ s − µ ∗ )) + 1) [ γ ≥ (10)= e µ s e µ s + e µ ∗ (cid:18) e − ( γ − ( µ s − µ ∗ )) ( e − ( γ − ( µ s − µ ∗ )) + 1) [ γ ≥ (cid:19)(cid:46) e µ s e µ s + e µ ∗ = e µ s e µ s + e µ ∗ (cid:18) e − ( γ − ( µ s − µ ∗ )) ( e − ( γ − ( µ s − µ ∗ )) + 1) [ γ ≥ (cid:19)(cid:46) e − ( µ s − µ ∗ ) (11)Now, in Equation 11, the term e µs e µs + e µ ∗ = e µs e µs + (cid:80) i (cid:54) = s e µi = e µs (cid:80) i e µi is the probability of selecting ω s .The term e − ( γ − ( µs − µ ∗ )) ( e − ( γ − ( µs − µ ∗ )) +1) [ γ ≥ is the density of theevent that a logistic random variable with location µ s − µ ∗ has value γ and is nonnegative.Finally, the term e − ( µs − µ ∗ ) is the probability that alogistic random variable with location µ s − µ ∗ is nonnegative.Thus (cid:16) e − ( γ − ( µs − µ ∗ )) ( e − ( γ − ( µs − µ ∗ )) +1) [ γ ≥ (cid:17) (cid:46) e − ( µs − µ ∗ ) is the prob-ability of a logistic random variable having value γ condi-tioned on it being nonnegative.Therefore Equation 11 is the probability of selecting ω s and independently sampling a nonnegative value γ from theconditional logistic distribution location parameter µ − µ ∗ (i.e., conditional on it only returning nonnegative values).Now, we apply Lemmas 10 and 9 with the help of Equa-tion 10 to finish the proof: | ln f ( ω s , γ | D ) f ( ω s , γ | D (cid:48) ) | ≤ | ( µ s − µ ∗ ) − ( µ (cid:48) s − µ ∗(cid:48) ) |≤ | µ s − µ (cid:48) s | − | ln (cid:88) i (cid:54) = s e µ i − ln (cid:88) i (cid:54) = s e µ (cid:48) i |≤ (cid:15)/ (cid:15)/ µ i = (cid:15)µ ( D,i )2 ∆ µµ