[PDF] Differentially Private Quantiles

Abstract

Quantiles are often used for summarizing and understanding data. If that data is sensitive, it may be necessary to compute quantiles in a way that is differentially private, providing theoretical guarantees that the result does not reveal private information. However, when multiple quantiles are needed, existing differentially private algorithms fare poorly: they either compute quantiles individually, splitting the privacy budget, or summarize the entire distribution, wasting effort. In either case the result is reduced accuracy. In this work we propose an instance of the exponential mechanism that simultaneously estimates exactly m quantiles from n data points while guaranteeing differential privacy. The utility function is carefully structured to allow for an efficient implementation that returns estimates of all m quantiles in time O(mn\log(n) + m^2n). Experiments show that our method significantly outperforms the current state of the art on both real and synthetic data while remaining efficient enough to be practical.

Full PDF

DDiﬀerentially Private Quantiles

Jennifer Gillenwater ∗ Matthew Joseph † Alex Kulesza ‡ February 17, 2021

Abstract

Quantiles are often used for summarizing and understanding data. If that data is sensitive,it may be necessary to compute quantiles in a way that is diﬀerentially private, providingtheoretical guarantees that the result does not reveal private information. However, in thecommon case where multiple quantiles are needed, existing diﬀerentially private algorithmsscale poorly: they compute each quantile individually, splitting their privacy budget and thusdecreasing accuracy. In this work we propose an instance of the exponential mechanism thatsimultaneously estimates m quantiles from n data points while guaranteeing diﬀerential privacy.The utility function is carefully structured to allow for an eﬃcient implementation that avoidsexponential dependence on m and returns estimates of all m quantiles in time O ( mn + m n ).Experiments show that our method signiﬁcantly outperforms the current state of the art onboth real and synthetic data while remaining eﬃcient enough to be practical. Quantiles are a widespread method for understanding real-world data, with example applicationsranging from income [24] to birth weight [6] to standardized test scores [12]. At the same time,the individuals contributing data may require that these quantiles not reveal too much informationabout individual contributions. As a toy example, suppose that an individual joins a company thathas exactly two salaries, and half of current employees have one salary and half have another. Inthis case, publishing the exact median company salary will reveal the new employee’s salary.

Diﬀerential privacy [11] oﬀers a solution to this problem. Informally, the distribution over adiﬀerentially private algorithm’s outputs must be relatively insensitive to the input of any singledata contributor. Returning to the salary example, a diﬀerentially private method for computingthe median company salary would have similar-looking output distributions regardless of whichsalary the new employee receives. The resulting uncertainty about any single contributor’s datamakes the algorithm “private”.In this work, we study diﬀerentially private estimation of user-speciﬁed quantiles q , . . . , q m ∈ [0 ,

1] for a one-dimensional dataset X of size n . The output quantile estimates consist of m values,which we denote o , . . . , o m . Ideally, the o j are as close to the dataset’s actual quantiles as possible.For example, if q j = 0 .

5, then our goal is to have o j be as close to the median of X as possible.Several algorithms for diﬀerentially private quantiles exist [19, 8, 26, 28]. These algorithmsfocus on estimating a single quantile. In one sense, this is without loss of generality because ofdiﬀerential privacy’s composition guarantee. Basic composition says that, if we estimate each of m ∗ Google New York, [email protected] † Google New York, [email protected] ‡ Google New York, [email protected] a r X i v : . [ c s . L G ] F e b uantiles via an εm -diﬀerentially private algorithm, then we will obtain ε -diﬀerential privacy overallfor the set of m quantiles (see Section 2 for details). However, the cost of this generality is thesmaller and more restrictive privacy budget εm (or roughly ε √ m for “advanced” composition [15] andother variants tailored to the exponential mechanism [7]). As a result, estimating m quantiles viarepeated application of a single quantile algorithm yields signiﬁcantly more inaccurate outcomesas m grows. This is unfortunate, as many applications rely on multiple quantiles: returning tothe opening paragraph, the income statistics use m = 4 (quintiles), the birth weight statistics use m = 9 (deciles), and the test score statistics use m >

1. We give an instantiation of the exponential mechanism [17],

JointExp , that produces an ε -diﬀerentially private collection of m quantile estimates in a single invocation (Section 3.1).This mechanism uses a utility function that has a ﬁxed sensitivity of 2 no matter how manyquantiles are requested, and does not need to divide ε based on the number of quantiles.2. We provide a dynamic program, related to algorithms used for inference in graphical models,to implement JointExp in time O ( mn + m n ) (Section 3.2, Section 3.3). This signiﬁcantlyimproves on naive sampling, which requires time O ( n m ).3. We experimentally evaluate JointExp against current algorithms on synthetic and real datasets.We ﬁnd that

JointExp is eﬃcient enough to be practical for moderate dataset sizes and obtainsmuch better accuracy than the existing state-of-the-art (Section 5).

Several algorithms for estimating a single quantile have appeared in the diﬀerential privacy lit-erature [19, 8, 26, 28]. Details for previous approaches appear in our experiments (Section 5.1).The main diﬀerence is that our algorithm focuses on estimating multiple quantiles at once and hassensitivity independent of the number of quantiles m . Hence, as m grows, our algorithm does notneed to split ε into ever-smaller pieces as other algorithms do.As noted in the list of contributions above, our algorithm is an instantiation of the exponentialmechanism. Blocki, Datta, and Bonneau [2] studied how to release the counts (but not identities)of items in a dataset by constructing a relaxation of the exponential mechanism and samplingfrom it using dynamic programming. However, it is not clear how this method can be applied toquantiles. In particular, our utility function requires accounting for interactions between indicesof the sampled vector, while their utility function neatly decomposes into individual terms. Blockiet al. [2] closed their paper by asking if dynamic programming could produce eﬃcient applicationsof the exponential mechanism in other settings; our work is in part an aﬃrmative answer to thisquestion.Finally, our dynamic program for sampling from JointExp ’s exponential mechanism is relatedto inference algorithms for graphical models. Several papers have studied diﬀerential privacy withgraphical models. However, this has typically meant studying private versions of graphical modelingtasks [30, 1] or using graphical models as a step in private algorithms [16]. Our paper departs fromthat past work in that its dynamic program, while related to e.g. the forward-backward algorithm,does not have any conceptual dependence on graphical models themselves.2

Preliminaries

We view databases

X, X (cid:48) as multisets of elements from some data domain X where each individualcontributes at most one element to the database. To reason about databases that are “close”,diﬀerential privacy uses neighbors . Deﬁnition 1.

Databases X and X (cid:48) ∈ X n are neighbors , denoted X ∼ X (cid:48) , if they diﬀer in at mostone element. Note that we use the swap deﬁnition of diﬀerential privacy; in contrast, the add-remove deﬁ-nition allows the addition or removal (rather than exchange) of one element between neighboringdatabases. We do this for consistent evaluation against the smooth-sensitivity framework (seeSection 5.1.2), which also uses swap diﬀerential privacy. However, we emphasize that our algo-rithm

JointExp easily adapts to the add-remove framework (in fact, its sensitivity is lower underadd-remove privacy).With the notion of neighboring databases in hand, we can now deﬁne diﬀerential privacy.

Deﬁnition 2 (Dwork, McSherry, Nissim, and Smith [11]) . A randomized algorithm A : X ∗ → Y is ( ε, δ ) -diﬀerentially private if, for every pair of neighboring databases X, X (cid:48) and every output subset Y ⊂ Y , P A [ A ( X ) ∈ Y ] ≤ e ε P A (cid:2) A ( X (cid:48) ) ∈ Y (cid:3) + δ. When δ > , we say A satisﬁes approximate diﬀerential privacy. If δ = 0 , we say A satisﬁes pure diﬀerential privacy, and shorthand this as ε -diﬀerential privacy (or ε -DP). Our algorithm

JointExp will be ε -DP.A key beneﬁt of diﬀerential privacy is composability: an algorithm that relies on diﬀerentiallyprivate subroutines inherits an overall privacy guarantee by simply adding up the privacy guaranteesof its components. Lemma 1 (Dwork et al. [11]) . Let A , . . . , A k be k algorithms that respectively satisfy ( ε , δ ) - , . . . , ( ε k , δ k ) -diﬀerential privacy. Then running A , . . . , A k satisﬁes (cid:16)(cid:80) ki =1 ε i , (cid:80) ki =1 δ i (cid:17) -diﬀerentialprivacy. We will use composability (or its “advanced” variants) when evaluating baseline methods thatestimate a set of m quantiles by estimating each quantile individually. By Lemma 1, to achieveoverall ε -DP, it suﬃces to estimate each quantile under εm -DP. However, since our algorithm Join-tExp estimates all quantiles in one invocation, it does not use composition.We will also rely on the exponential mechanism, a common building block for diﬀerentiallyprivate algorithms.

Deﬁnition 3 (McSherry and Talwar [17], Dwork and Roth [9]) . Given utility function u : X ∗ × O → R mapping ( database , output ) pairs to real-valued scores with L sensitivity ∆ u = max X ∼ X (cid:48) ,o ∈ O | u ( X, o ) − u ( X (cid:48) , o ) | , the exponential mechanism M has output distribution P M [ M ( X ) = o ] ∝ exp (cid:18) εu ( X, o )2∆ u (cid:19) , where ∝ elides the normalization factor. Lemma 2 (McSherry and Talwar [17]) . The mechanism described in Deﬁnition 3 is ε -DP. The above material suﬃces to understand the bulk of our algorithm,

JointExp . The algorithmsused for our experimental comparisons will also require some understanding of smooth sensitivityand concentrated diﬀerential privacy, but since these concepts will be relevant only as points ofexperimental comparison, we discuss them in Section 5. JointExp

This section provides an exposition of our quantiles algorithm,

JointExp . Recall that our goal is totake as input quantiles q < q < . . . < q m ∈ [0 ,

1] and database X and output quantile estimates o , . . . , o m such that, for each j ∈ [ m ], P x ∼ U X [ x ≤ o j ] ≈ q j .In Section 3.1, we start with an instance of the exponential mechanism whose continuous outputspace makes sampling impractical. In Section 3.2, we construct a mechanism with the same outputdistribution (and, importantly, the same privacy guarantees) and a bounded but ineﬃcient samplingprocedure. Finally, in Section 3.3 we modify our sampling procedure once more to produce anequivalent and polynomial time method, which we call JointExp . We start by formulating an instance of the exponential mechanism for our quantiles setting. First,we will require the algorithm user to input a lower bound a and upper bound b for the data domain. We assume that all x ∈ X are in [ a, b ]; if this is not the case initially, then we clamp any outsidepoints to [ a, b ]. The output space is O (cid:37) = { ( o , . . . , o m ) | a ≤ o ≤ · · · ≤ o m ≤ b } , the set ofsequences of m nondecreasing values from [ a, b ]. For a given o = ( o , . . . , o m ), the utility functionwill compare the number of points in each proposed quantile interval [ o j − , o j ) to the expectednumber of points in the correct quantile interval. We denote the number of data points betweenadjacent quantiles q j − and q j by n j = ( q j − q j − ) n . We ﬁx q = 0 and q m +1 = 1, so that n = q n and n m +1 = (1 − q m ) n . We also denote the number of data points from X between any two values u and v by n ( u, v ) = |{ x ∈ X | u ≤ x < v }| . We can now deﬁne our utility function u Q ( X, o ) = − (cid:88) j ∈ [ m +1] | n ( o j − , o j ) − n j | , where we ﬁx o = a and o m +1 = b + 1 (setting o m +1 to a value larger than b ensures that pointsequal to b are counted in the ﬁnal term of the sum). u Q thus places the highest probability on thetrue quantile values and lower probability on estimates that are far from the true quantile values. Lemma 3. u Q has L sensitivity ∆ u Q = 2 .Proof. Fix an output o . Let X and X (cid:48) be neighboring databases. Since we use swap diﬀerentialprivacy, | X | = | X (cid:48) | , so u Q ( X, o ) and u Q ( X (cid:48) , o ) only diﬀer in their n ( · , · ) (respectively denoted n (cid:48) ( · , · )for X (cid:48) ). Since X and X (cid:48) are neighbors, there are at most two intervals ( o j − , o j ) and ( o j (cid:48) − , o j (cid:48) )on which n and n (cid:48) diﬀer, each by at most one. Thus | u Q ( X, o ) − u Q ( X (cid:48) , o ) | ≤ Lower and upper bounds are also necessary for the private quantile algorithms that we compare to in ourexperiments. We ﬁnd that choosing loose bounds a and b does not greatly aﬀect utility (see experiments in Section 5). u Q = 2[1 − min j ∈ [ m +1] ( q j − q j − )].A full proof of this and other results appears in Appendix A.The corresponding mechanism M Q has output density f Q ( o ) ∝ · exp (cid:18) ε u Q · u Q ( X, o ) (cid:19) . (1)Since this is an instantiation of the exponential mechanism, we can apply Lemma 2 to get: Lemma 4.

The mechanism M Q deﬁned by output density f Q satisﬁes ε -diﬀerential privacy. However, as is typically a drawback of the exponential mechanism, it is not clear how to eﬃ-ciently sample from this distribution, which is deﬁned over a continuous m -dimensional space. Thefollowing section address this issue. Since the output distribution itself remains ﬁxed through thesesampling procedure changes, these improvements will preserve the privacy guarantee of Lemma 4.The remaining proofs will therefore focus on verifying that subsequent sampling procedures stillsample according to Eq. 1. In this section, we describe how to sample from the continuous distribution deﬁned by M Q by ﬁrstsampling from an intermediate discrete distribution. This is similar to the single-quantile samplingtechnique given by Smith [26] (see their Algorithm 2). The basic idea is that we split the samplingprocess into three steps:1. Sample m intervals from the set of intervals between data points.2. Take a uniform random sample from each of the m sampled intervals.3. Output the samples in increasing order.This will require some notation setup. Denote the elements of X in nondecreasing order by x ≤ · · · ≤ x n , ﬁx x = a and x n +1 = b , and let I = { , . . . , n } , where we associate i ∈ I withthe interval between points x i and x i +1 . Deﬁne S (cid:37) to be the set of nondecreasing sequences of m intervals, S (cid:37) = { ( i , . . . , i m ) | i , . . . , i m ∈ I, i ≤ · · · ≤ i m } .S (cid:37) will be the discrete output space for the ﬁrst sampling step above. We can deﬁne a utilityfunction u Q (cid:48) on s = ( i , . . . , i m ) ∈ S (cid:37) by slightly modifying u Q : u Q (cid:48) ( X, s ) = − (cid:88) j ∈ [ m +1] | ( i j − i j − ) − n j | , where we ﬁx i = 0 and i m +1 = n .In order to reproduce M Q from Section 3.1, our sequence sampler will also need to weight eachsequence s by the total measure of the outputs o ∈ O (cid:37) that can be sampled from s in the secondstep. This is nontrivial due to the ordering constraint on o : if an interval appears twice in s , themeasure of corresponding outputs must be halved to account for the fact that the two correspondingsamples in the second step can appear in either order, but will be mapped to a ﬁxed increasingorder in the third step. In general, if an interval appears k times, the measure must be scaled by a5actor of 1 /k !, the volume of the standard k -simplex. We account for this by dividing by the scalefunction γ ( s ) = (cid:89) i ∈ I count s ( i )! , where count s ( i ) is the number of times i appears in s and we take 0! = 1.The mechanism M Q (cid:48) is now deﬁned as follows:1. Draw s = ( i , . . . , i m ) according to P M Q (cid:48) [ s ] ∝ exp (cid:18) ε · u Q (cid:48) ( X, s )2∆ u Q (cid:19) · (cid:81) mj =1 ( x i j +1 − x i j ) γ ( s ) .

2. For j ∈ [ m ], draw o j uniformly at random from [ x i j , x i j +1 ).3. Output o , . . . , o m in increasing order.It remains to verify that M Q (cid:48) actually matches M Q . Lemma 5. M Q (cid:48) has the same output distribution as M Q .Proof Sketch (see Appendix A for full proof ). Given potential outputs o and o (cid:48) , if the correspond-ing quantile estimates fall into the same intervals between data points in X , then the counts n ( · , · )are unchanged and u Q ( X, o ) = u Q ( X, o (cid:48) ). Since u Q is constant over intervals between data points,it is equivalent to sample those intervals and then sample points uniformly at random from thechosen intervals. The only complication is accounting for the γ scaling introduced by repeatedintervals.The beneﬁt of M Q (cid:48) over M Q is that the ﬁrst step samples from a ﬁnite space, and the secondsampling step is simply uniform sampling. However, the size of the space for the ﬁrst step is still O ( n m ), which remains impractical for all but the smallest datasets. In the next section, we developa dynamic programming algorithm that allows us to sample from P M Q (cid:48) in time O ( mn + m n ). Notice that the bulk of our probability distribution over sequences s = ( i , . . . , i m ) can be decom-posed as a product of scores, where each score depends only on adjacent intervals i j − and i j . Inparticular, P M Q (cid:48) [ s ] ∝ γ ( s ) (cid:89) j ∈ [ m +1] φ ( i j − , i j , j ) , with φ deﬁned for i ≤ i (cid:48) as follows. For j ∈ [ m ]: φ ( i, i (cid:48) , j ) = exp (cid:18) − ε u Q | ( i (cid:48) − i ) − n j | (cid:19) ( x i (cid:48) +1 − x i (cid:48) )For j = m + 1: φ ( i, n, m + 1) = exp (cid:18) − ε u Q | ( n − i ) − n m +1 | (cid:19) . Fig. 1 illustrates this structure graphically, suggesting a dynamic programming algorithm similarto the “forward-backward” algorithm from the graphical models literature (see e.g. Chapter 15 ofRussell and Norvig [22]). 6 i φ ( i , i , i i m φ ( i , i , i = 0 φ ( i m , i m +1 , m + 1) φ ( i , i , i m +1 = n Figure 1: Illustration of pairwise dependencies for interval sequence s = ( i , . . . , i m ).Unfortunately, γ ( s ) does not factor in the same way. However, it has its own special structure:since s is required to be nondecreasing, γ ( s ) decomposes over contiguous constant subsequences of s . We will use this to design an eﬃcient dynamic programming algorithm for sampling P M Q (cid:48) .Deﬁne the function α : [ m ] × I × [ m ] → R so that α ( j, i, k ) is the total unnormalized probabilitymass for preﬁx sequences of length j that end with exactly k copies of the interval i . For all i ∈ I ,let α (1 , i,

1) = φ (0 , i,

1) and α (1 , i, k ) = 0 for k >

1. Now, for j = 2 , . . . , m , we have the followingrecursion for all i ∈ I : α ( j, i,

1) = (cid:88) i (cid:48)

1) = φ ( i, i, j ) · α ( j − , i, k − /k Intuitively, if the sequence ends with a single i , we need to sum over all possible preceding intervals i (cid:48) , which could have been repeated up to k < j times. On the other hand, if the sequence endswith more than one i , we know that the preceding interval was also i , and we simply divide by k to account for the scale function γ .We can now sample in the reverse direction as follows. First, draw a pair( i, k ) ∝ α ( m, i, k ) φ ( i, n, m + 1)(the φ term accounts for the ﬁnal edge in the graph; see Appendix A for details). This determinesthat the last k sampled intervals are equal to i . We can then draw another pair( i (cid:48) < i, k (cid:48) ) ∝ α ( m − k, i (cid:48) , k (cid:48) ) φ ( i (cid:48) , i, m − k + 1) , which determines that the last k (cid:48) remaining intervals in the sequence are i (cid:48) , and so on until we havea complete sample.This dynamic program can be implemented to run in time O ( mn + m n ). Algorithm 1 providespseudocode for the overall sampling process, including computational optimizations, and Theorem 1veriﬁes that Algorithm 1 samples eﬃciently from the correct distribution. The full proof appearsin Appendix A, but we note here that the privacy proof mainly involves unrolling the productsand sums to show that the algorithm correctly implements M Q (cid:48) , and the time and space proofs arestraightforward from the pseudocode. Theorem 1.

JointExp satisﬁes ε -diﬀerential privacy, takes time O ( mn + m n ) , and uses space O ( mn + m n ) . Numerical improvements.

Note that the quantities involved in computing φ and α maybe quite small, so we implement JointExp using logarithmic quantities to avoid underﬂow errorsin our experiments. This is a common trick and is a numerical rather than algorithmic change,but for completeness we include its details in Appendix B. After computing these quantities, toavoid underﬂow in our ﬁnal sampling steps, we use a “racing” sampling method that was previouslydeveloped for single-quantile exponential mechanisms. Since this is again a numerical improvement,details appear in Appendix C. 7 lgorithm 1

Pseudocode for

JointExp

Input: sorted X = ( x ≤ . . . ≤ x n ) clamped to data range [ a, b ], quantiles q , . . . , q m , privacyparameter ε Set x = a , x n +1 = b , and ∆ u Q = 2Set I = { , . . . , n } , i = 0, and i m +1 = n for i ∈ I , i (cid:48) ≥ i , j ∈ [ m ] do Set φ ( i, i (cid:48) , j ) = exp (cid:16) − ε ·| ( i (cid:48) − i ) − n j | uQ (cid:17) · ( x i (cid:48) +1 − x i (cid:48) ) end forfor i ∈ I do Set φ ( i, n, m + 1) = exp (cid:16) − ε ·| ( n − i ) − n m +1 | uQ (cid:17) end forfor i ∈ I do Set α (1 , i,

1) = φ (0 , i, end forfor j = 2 , . . . , m dofor i ∈ I do Set ˆ α ( j − , i ) = (cid:80) k
1) = (cid:80) i (cid:48) do Sample ( i < i j +1 , k ) ∝ α ( j, i, k ) φ ( i, i j +1 , j + 1)Set i j − k +1 , . . . , i j = i , and j = j − k end while Output uniform samples { o j ∼ U [ x i j , x i j +1 ) } mj =1 in increasing order Connection to graphical models.

As mentioned above, the dynamic program in Algo-rithm 1 is similar to the forward-backward algorithm from the graphical models literature, moduloaccounting for γ ( s ). In graphical models, it is frequently necessary to compute the probabilityof a sequence of hidden states. This requires normalizing by a sum of probabilities of sequences,and, naively, this sum has an exponential number of terms, like Z Q (cid:48) . However, when probabilitiesdecompose into products of score functions of adjacent states, the forward-backward algorithmmakes the process eﬃcient. The extra γ ( s ) term makes our sampling process more complex in away that is similar to semi-Markov models [31]. In graphical model terms, γ can be thought of asa prior that discourages repeats: p prior ( s ) ∝ /γ ( s ). This prior can also be written as a product of n Poisson distributions, each with parameter λ = 1.8 Accuracy Intuition

JointExp applies the exponential mechanism once to output m quantiles. The closest competitoralgorithms also apply the exponential mechanism but use m invocations to produce m quantiles. Tobuild intuition for why the former approach achieves better utility, we recall the standard accuracyguarantee for the exponential mechanism: Lemma 6 (McSherry and Talwar [17]) . Let M be an ε -DP instance of the exponential mecha-nism having score function u with sensitivity ∆ u and output space Y . Then for database X , withprobability at least − β , M produces output y such that u ( X, y ) ≥ max y ∗ ∈Y u ( X, y ∗ ) − u log( |Y| /β ) ε . For simplicity, suppose we have uniform data where all interval widths x i +1 − x i are identical.As shown by the experiments in the next section, this is not necessary for JointExp to obtain goodutility, but we assume it for easier intuition. Then (modulo the minor term γ ( s ) that accountsfor rare repeated intervals in the output), P M Q (cid:48) [ s ] ∝ exp (cid:16) ε · u Q (cid:48) ( X,s )2∆ uQ (cid:17) . This means that JointExp ’sprocess of sampling intervals draws from a distribution whose shape is identical to an exponentialmechanism with utility function u Q (cid:48) , but mismatched sensitivity term ∆ u Q = 2. Since the proof ofLemma 6 does not rely on the utility function matching the sensitivity term, we can still apply itto determine the accuracy of this interval sampling procedure. The output space Y for JointExp ’sinterval-sampling has size | S (cid:37) | ≤ n m , so we expect to sample intervals yielding quantiles that intotal misclassify O ( m log( n ) /ε ) points.In contrast, m invocations of a single-quantile exponential mechanism requires each invocationto satisfy ε i = ε/m -DP (standard composition) or roughly ε i = ε/ √ m -DP (advanced composition).Because each invocation uses an output space of size O ( n ), the total error guarantee via Lemma 6scales like O ( m log( n ) /ε i ). Since m/ε i = ω ( m ) /ε for even the best known composition bounds forthe exponential mechanism [7], these approaches incur error with a superlinear dependence on m .This contrasts with JointExp ’s error, which has only a linear dependence on m . We now empirically evaluate our algorithm

JointExp against four alternative algorithms:

Ind-Exp [26],

AppIndExp , Smooth [19], and

CSmooth [19, 4]. Explanations of these algorithms andthe reasons for comparing to them appear in Section 5.1. Dataset descriptions, accuracy experi-ments, and time experiments respectively appear in Section 5.2, Section 5.3, and Section 5.4. Allexperiment code may be found on Github [13].

IndExp and

AppIndExp

Our ﬁrst comparison algorithm

IndExp uses independent applications of the exponential mecha-nism. Smith [26] introduced the basic

IndExp algorithm for estimating one quantile, and it hassince been incorporated into the SmartNoise [25] and IBM [14] diﬀerential privacy libraries.

Ind-Exp thus gives us a meaningful baseline for a real-world approach.

IndExp uses the exponential We discuss omitted private quantiles algorithms in Appendix D. Omitted algorithms rely on additional knowledgeabout the data, have large error, and/or are extremely ineﬃcient. q via the utility function u ( X, o ) = ||{ x ∈ X | x ≤ o }|− qn | . Lemma 7. u deﬁned above has L sensitivity ∆ u = 1 .Proof. Consider swapping x ∈ X for x (cid:48) , and ﬁx some o . If x, x (cid:48) ≤ o or x, x (cid:48) > o , then u ( X, o ) = u ( X (cid:48) , o ). If exactly one of x or x (cid:48) is ≤ o , then | u ( X, o ) − u ( X (cid:48) , o ) | = 1. IndExp takes user-provided data bounds a and b and runs on X + = X ∪ { a, b } . After sorting X + into intervals of adjacent data points I , . . . , I n , IndExp selects an interval I j with probabilityproportional to score( X, I j ) = exp( − ε | j − qn | / · | I j | and randomly samples the ﬁnal quantile estimate from I j .To estimate m quantiles with IndExp , we call it m times with privacy parameter ε/m and outputthe resulting quantile estimates in increasing order. These m invocations are a straightforwardapplication of diﬀerential privacy’s basic composition guarantee (Lemma 1).We also consider AppIndExp , an ( ε, δ )-DP variant of

IndExp . The only diﬀerence between

Ap-pIndExp and

IndExp is that

AppIndExp uses the exponential mechanism’s nonadaptive advancedcomposition guarantee given by Dong, Durfee, and Rogers [7]. This enables

AppIndExp to spendmore privacy budget estimating each of the m quantiles. Details appear in Appendix E.1, but wenote here that, to the best of our knowledge, this is the tightest known composition analysis for theexponential mechanism. Since our experiments use n = 1000 data points, we always use δ = 10 − in accordance with the common recommendation that δ (cid:28) n (see e.g. the discussions around thedeﬁnition of diﬀerential privacy from Dwork and Roth [9] and Vadhan [29]). Smooth

The third comparison algorithm is

Smooth , which relies on the smooth sensitivity framework intro-duced by Nissim, Raskhodnikova, and Smith [19]. The basic idea of this framework is to circumventglobal sensitivity by instead using a smooth analogue of local sensitivity. This is useful for problemswhere the global sensitivity is large only for “bad” datasets.

Deﬁnition 4.

For function f : X n → R , the local sensitivity ∆ f ( X ) of f for dataset X is max X (cid:48) | X ∼ X (cid:48) | f ( X ) − f ( X (cid:48) ) | . Recall that global sensitivity is deﬁned over all possible pairs of datasets. In contrast, localsensitivity is also parameterized by a ﬁxed dataset X and deﬁned only over neighbors of X . Itis therefore possible that ∆ f ( X ) (cid:28) ∆ f . For example, if M is the median function and we set X = [ − , M ( {− , , } ) = 1 while ∆ M ( {− , , } ) = 100. However, this alsoshows that local sensitivity itself reveals information about the dataset. The insight of Nissim et al.[19] is that it is possible to achieve diﬀerential privacy and take advantage of lower local sensitivityby adding noise calibrated to a “smooth” approximation of ∆ f ( X ). Deﬁnition 5 (Nissim et al. [19]) . For t > , the t -smooth sensitivity of f on database X of n points is S tf ( X ) = max X (cid:48) ∈X n e − t · d ( X,X (cid:48) ) · ∆ f ( X (cid:48) ) . For

Smooth , we use the Laplace noise variant of smooth sensitivity given by Nissim et al.[19]. This only satisﬁes ( ε, δ )-DP, but it oﬀers the lightest (and thus most competitive) noise forcomputing a one-dimensional statistic. Details for computing the median’s smooth sensitivity andthe precise noise added appear in Appendix E.2.10 .1.3

CSmooth

The fourth and ﬁnal comparison algorithm is

CSmooth . CSmooth has not explicitly appeared inthe literature, but it is a straightforward combination of the smooth sensitivity framework and concentrated diﬀerential privacy (CDP) [10, 3]. CDP is a variant of diﬀerential privacy that oﬀerscomparable privacy guarantees with often tighter privacy analyses. Bun and Steinke [4] showed howto combine CDP with the smooth sensitivity framework. Since the relevant CDP noise distributionscan have even lighter tails than the Laplace noise used by

Smooth , we include

CSmooth as aplausibly stronger comparison. Our experiments use the Laplace Log-Normal noise distribution,which achieved the strongest accuracy results in the experiments of Bun and Steinke [4].One complication of

CSmooth is the need to select several parameters to specify the noise dis-tribution. We attempted to select these parameters to give

CSmooth the strongest utility possible.Details for this process (and

CSmooth in general) appear in Appendix E.2.To compare the

JointExp ’s pure DP guarantee to

CSmooth ’s CDP guarantee, we use the fol-lowing lemma:

Lemma 8 (Proposition 1.4 [3]) . If an algorithm is ε -DP, then it is also ε -CDP. We thus evaluate our ε -DP algorithm JointExp against an ε -CDP CSmooth . This comparisonfavors

CSmooth : recalling our requirement that approximate DP algorithms have δ ≤ − , thebest known generic conversion from CDP to approximate DP only says that a -CDP algorithmis ( ε, − )-DP for ε ≥ .

76 (Proposition 1.3, [3]). A more detailed discussion of DP and CDPappears in Section 4 of the work of Canonne, Kamath, and Steinke [5].As with

IndExp , to estimate m quantiles with CSmooth , we call it m times with an appropriatelyreduced privacy parameter. This time, we use CDP’s composition guarantee: Lemma 9 (Proposition 1.7 [3]) . The composition of k ρ -CDP algorithms is kρ -CDP. From Lemma 8 the overall desired privacy guarantee is ε -CDP, so we use ε (cid:48) = ε √ m in each call. We evaluate all ﬁve methods

JointExp , IndExp , AppIndExp , Smooth , and

CSmooth on four types ofdata: synthetic uniform data from U ( − , N (0 , Our error metric is the number of “missed points”: for each desired quantile q j , we take the truequantile estimate o j and the private estimate ˆ o j , compute the number of data points between o j o j , and sum these counts across all m quantiles. For each dataset, we compare the number ofmissed points for all ﬁve algorithms as m grows.In each case, the requested quantiles are evenly spaced. m = 1 is median estimation, m = 2requires estimating the 33rd and 66th percentiles, and so on. We average scores across 20 trialsof 1000 random samples. For every experiment, we take [ − , − , ε = 1 appear in Figure 3.Figure 3: Error vs. ε = 1. Note the logarithmic y -axis:at m = 15, JointExp misclassiﬁes 10x fewer points than any of its competitors.Across all four datasets, a clear eﬀect appears: for a wide range of m , JointExp dominates allother algorithms. First,

JointExp beats or matches its competitors even for m = 1. At m = 6 JointExp makes 2x fewer errors than its closest competitor, and at m = 15 the gap grows to 10x. JointExp ’s absolute accuracy is also strong: when estimating ≤

20 quantiles,

JointExp on averageestimates each quantile with ≤

10 misclassiﬁed data points. For comparison, the other algorithmscannot maintain this average misclassiﬁcation rate past m ≤ JointExp obtains the best privacy in addition to the best accuracy. This is because

JointExp is ε -DP while AppIndExp and

Smooth are ( ε, δ )-DP, and

CSmooth is ε -CDP. The onlyother ε -DP algorithm, IndExp , is 10x worse than

JointExp at m = 9 and 20x worse at m = 19.We also evaluate a high-privacy regime where we use ε = 0 . ε = 1. We defer theseplots to Appendix F but note here that a similar (but smaller) separation holds. This is becausethe smaller privacy budget leads all methods to converge to trivial error at m ≈ .4 Time Experiments Figure 4: Time vs m , ε = 1, aver-aged across 20 trials of 1,000 samples.We conclude by evaluating the methods by run-time. The number of data points and quan-tiles are the main determinants of time, so weonly include time experiments using Gaussiandata. All experiments were run on a machinewith two CPU cores and 100GB RAM. As seenin Fig. 4, JointExp is the slowest method, fol-lowed by

Smooth and

CSmooth , and ﬁnally

Ind-Exp and

AppIndExp . This is a relative weaknessfor

JointExp . However,

JointExp ’s time peaks at < . m = 29, which is well withinpractical ranges. In this work we constructed a low-sensitivity exponential mechanism for diﬀerentially private quan-tile estimation and designed a dynamic program to sample from it eﬃciently. The result is a prac-tical algorithm that achieves much better accuracy than existing methods. A possible direction forfuture work is exploring other applications of the exponential mechanism where the utility functionis low sensitivity and can be decomposed into “local” score functions, as in the pairwise intervalterms of φ . More precisely, by analogy to the graphical models techniques generally known as be-lief propagation [21], any utility function whose outputs have a chain or tree dependency structureshould be tractable to sample. We thank Thomas Steinke for helpful discussion of concentrated diﬀerential privacy. We also thankAndr´es Mu˜noz Medina for helpful comments on an early draft of this paper.13 eferences [1] Garrett Bernstein, Ryan McKenna, Tao Sun, Daniel Sheldon, Michael Hay, and Gerome Mik-lau. Diﬀerentially private learning of undirected graphical models using collective graphicalmodels. In

International Conference on Machine Learning (ICML) , 2017.[2] Jeremiah Blocki, Anupam Datta, and Joseph Bonneau. Diﬀerentially private password fre-quency lists. In

Network and Distributed System Security (NDSS) , 2016.[3] Mark Bun and Thomas Steinke. Concentrated diﬀerential privacy: Simpliﬁcations, extensions,and lower bounds. In

Theory of Cryptography Conference (TCC) , 2016.[4] Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensi-tivity and mean estimation. In

Neural Information Processing Systems (NeurIPS) , 2019.[5] Cl´ement Canonne, Gautam Kamath, and Thomas Steinke. The discrete gaussian for diﬀerentialprivacy. In

Neural Information Processing Systems (NeurIPS) , 2020.[6] CDC. Data table of infant weight-for-age charts. , 2001. Accessed: 2021-01-02.[7] Jinshuo Dong, David Durfee, and Ryan Rogers. Optimal diﬀerential privacy composition forexponential mechanisms and the cost of adaptivity. In

International Conference on MachineLearning (ICML) , 2020.[8] Cynthia Dwork and Jing Lei. Diﬀerential privacy and robust statistics. In

Symposium on theTheory of Computing (STOC) , 2009.[9] Cynthia Dwork and Aaron Roth. The algorithmic foundations of diﬀerential privacy.

Founda-tions and Trends ® in Theoretical Computer Science , 2014.[10] Cynthia Dwork and Guy N Rothblum. Concentrated diﬀerential privacy. arXiv preprintarXiv:1603.01887 , 2016.[11] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sen-sitivity in private data analysis. In Theory of Cryptography Conference (TCC) , 2006.[12] ETS. Gre guide to the use of scores. ,2020. Accessed: 2021-01-29.[13] Google. Diﬀerentially private quantiles code. https://github.com/google-research/google-research/tree/master/dp_multiq , 2021. Accessed: 2021-02-17.[14] IBM. Ibm diﬀerential privacy library. https://github.com/IBM/differential-privacy-library/blob/main/diffprivlib/tools/quantiles.py , 2019.Accessed: 2021-01-05.[15] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for diﬀerentialprivacy. In

International Conference on Machine Learning (ICML) , 2015.[16] Ryan Mckenna, Daniel Sheldon, and Gerome Miklau. Graphical-model based estimation andinference for diﬀerential privacy. In

International Conference on Machine Learning (ICML) ,2019. 1417] Frank McSherry and Kunal Talwar. Mechanism design via diﬀerential privacy. In

Foundationsof Computer Science (FOCS) , 2007.[18] Andr´es Mu˜noz Medina and Jenny Gillenwater. Duﬀ: A dataset-distance-based utility functionfamily for the exponential mechanism. arXiv preprint arXiv:2010.04235 , 2020.[19] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling inprivate data analysis. In

Symposium on the Theory of Computing (STOC) , 2007.[20] Stack Overﬂow. numerically stable way to multiply log probability ma-trices in numpy. https://stackoverflow.com/questions/23630277/numerically-stable-way-to-multiply-log-probability-matrices-in-numpy , 2014.Accessed: 2021-02-08.[21] Judea Pearl. Reverend Bayes on inference engines: A distributed hierarchical approach. In

Conference on Artiﬁcial Intelligence (AAAI) , 1982.[22] Stuart Russell and Peter Norvig.

Artiﬁcial Intelligence A Modern Approach . PearsonEducation/Prentice-Hall, third edition, 2010.[23] scipy. scipy.special.logsumexp. https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.logsumexp.html , 2020. Accessed: 2021-02-08.[24] Jessica Semega, Melissa Kollar, John Creamer, and Abinash Mohanty. Income and povertyin the united states: 2018. , 2020. Accessed: 2021-01-02.[25] SmartNoise. The exponential mechanism for medians. https://github.com/opendifferentialprivacy/smartnoise-core/blob/develop/whitepapers/mechanisms/exponential_median/ExponentialMechForMedian.pdf , 2020. Accessed: 2021-01-05.[26] Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In

Symposium on the Theory of Computing (STOC) , 2011.[27] Soumik. Goodreads-books dataset. , 2019. Accessed: 2020-12-27.[28] Christos Tzamos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, and Ilias Zadik. Optimal pri-vate median estimation under minimal distributional assumptions. In

Neural InformationProcessing Systems (NeurIPS) , 2020.[29] Salil Vadhan. The complexity of diﬀerential privacy. In

Tutorials on the Foundations ofCryptography . Springer, 2017.[30] Oliver Williams and Frank McSherry. Probabilistic inference and diﬀerential privacy.

NeuralInformation Processing Systems (NIPS) , 2010.[31] Shun-Zheng Yu. Hidden semi-markov models.

Artiﬁcial Intelligence , 2010.15

Full Proofs

We start with the add-remove version of the sensitivity analysis for ∆ u Q . We proved the swapversion as Lemma 3 in the main body, and this was the focus of the paper. The (slightly morefavorable) add-remove version appears below for completeness. Lemma 10.

In the add-remove model, ∆ u Q = 2[1 − min j ∈ [ m +1] ( q j − q j − )] . Proof.

Consider neighboring databases X (cid:48) = X ∪ { x (cid:48) } where | X (cid:48) | = n (cid:48) = n + 1 and | X | = n . Let n (cid:48) ( · , · ) denote an interval count using X (cid:48) , and let n ( · , · ) denote an interval count using X . All datapoints are clipped to [ a, b ], o = a , and o m +1 = b + 1, so there exists some [ o j ∗ − , o j ∗ ) containing x (cid:48) . o is nondecreasing and these intervals are half-open, so these intervals do not intersect. Thus,there is exactly one [ o j ∗ − , o j ∗ ) containing x (cid:48) . Then for j (cid:54) = j ∗ , n (cid:48) ( o j − , o j ) = n ( o j − , o j ) and( q j − q j − ) n (cid:48) − ( q j − q j − ) n = q j − q j − . Thus u Q ( X, o ) = −| n ( o j ∗ − , o j ∗ ) − n j ∗ | − (cid:88) j (cid:54) = j ∗ | n ( o j − , o j ) − n j | and u Q ( X (cid:48) , o ) = −| n ( o j ∗ − , o j ∗ ) + 1 − ( q ∗ j − q j ∗ − )( n + 1) | − (cid:88) j (cid:54) = j ∗ | n ( o j − , o j ) − ( q j − q j − )( n + 1) | . The distance between u Q ( X, o ) and u Q ( X (cid:48) , o ) contributed by the ﬁrst term is || n ( o j ∗ − , o j ∗ ) − ( q j ∗ − q j ∗ − ) n | − | n ( o j ∗ − , o j ∗ ) + 1 − ( q j ∗ − q j ∗ − )( n + 1) || = 1 − ( q j ∗ − q j ∗ − ) , and the distance contributed by the second term is (cid:88) j (cid:54) = j ∗ || n ( o j − , o j ) − ( q j − q j − ) n | − | n ( o j − , o j ) − ( q j − q j − )( n + 1) || ≤ (cid:88) j (cid:54) = j ∗ ( q j − q j − ) . Thus | u Q ( X, o ) − u Q ( X (cid:48) , o ) | ≤ − ( q j ∗ − q j ∗ − ) + (cid:88) j (cid:54) = j ∗ ( q j − q j − )= 2[1 − ( q j ∗ − q j ∗ − )] . The last equality follows from the fact that the sum over all quantile gaps is 1, so the sum overall but the q j ∗ − q j ∗ − gap is 1 − ( q j ∗ − q j ∗ − ). The quantity 2[1 − ( q j ∗ − q j ∗ − )] is maximized byminimizing ( q j ∗ − q j ∗ − ), which gives the ﬁnal sensitivity bound.Next, we verify that the ﬁnite sampling improvement from Section 3.2 still samples from thecorrect distribution. Lemma 5. M Q (cid:48) has the same output distribution as M Q .Proof. Recall that the output space for M Q was O (cid:37) = { o = ( o , . . . , o m ) | a ≤ o ≤ · · · ≤ o m ≤ b } .Deﬁne function h on [ a, b ] by h ( y ) = |{ x ∈ X | x < y }| . Then n ( u, v ) = h ( v ) − h ( u ), h ( o ) = h ( a ) =0 = i , and h ( o m +1 ) = h ( b + 1) = n = i m +1 . Thus u Q ( X, o ) = − (cid:88) j ∈ [ m +1] | n ( o j − , o j ) − n j | = − (cid:88) j ∈ [ m +1] | h ( o j ) − h ( o j − ) − n j | = u Q (cid:48) ( X, ( h ( o ) , . . . , h ( o m ))) . Z Q = (cid:90) O (cid:37) exp (cid:18) ε u Q · u Q ( X, o ) (cid:19) do = (cid:90) O (cid:37) exp (cid:18) ε u Q · u Q (cid:48) ( X, ( h ( o ) , . . . , h ( o m )) (cid:19) do. (2)Note that each o = ( o , . . . , o m ) ∈ O (cid:37) has o ∈ [ x i , x i +1 ) , . . . , o m ∈ [ x i m , x i m +1 ) for exactly one s = ( i , . . . , i m ) ∈ S (cid:37) . Shorthand this by o ∈ s , and let g O ( s ) = { o ∈ O (cid:37) | o ∈ s } . Then(2) = (cid:88) s ∈ S (cid:37) (cid:90) g O ( s ) exp (cid:18) ε u Q · u Q (cid:48) ( X, ( h ( o ) , . . . , h ( o m )) (cid:19) do = (cid:88) s ∈ S (cid:37) (cid:90) g O ( s ) exp (cid:18) ε u Q · u Q (cid:48) ( X, s ) (cid:19) do = (cid:88) s ∈ S (cid:37) exp (cid:18) ε u Q · u Q (cid:48) ( X, s ) (cid:19) · (cid:90) g O ( s ) do. (3)We focus on the (cid:82) g O ( s ) do term. If h ( o ) , . . . , h ( o m ) are all distinct, i.e. o , . . . , o m come from distinctintervals between data points, then (cid:90) g O ( s ) do = m (cid:89) j =1 ( x i j +1 − x i j ) . The remaining (and more complex) case is when h ( o ) , . . . , h ( o m ) are not distinct. Suppose h ( o ) , . . . , h ( o k ) are not distinct but the remaining h ( o k +1 ) , . . . , h ( o m ) are distinct and diﬀerentfrom h ( o ). Note that the non-distinct elements are consecutive since o ≤ · · · ≤ o m . Thenthere is some i ∈ I such that o , . . . , o k ∈ [ x i , x i +1 ). Thus the set of valid o , . . . , o k is exactly { ( o , . . . , o k ) | x i ≤ o ≤ · · · ≤ o k < x i +1 } .We need to determine the volume of this set. First, note that the collection consisting of all sets of k values from interval i has volume ( x i +1 − x i ) k . Then, note that the probability that k values selected at random from an interval will be perfectly sorted is 1 /k !; this is the volume of thestandard k -simplex, which is the set { ( x , . . . , x k ) | ≤ x ≤ · · · ≤ x k ≤ } . Hence, for the set thatwe are interested in, { ( o , . . . , o k ) | x i ≤ o ≤ · · · ≤ o k < x i +1 } , the volume is ( x i +1 − x i ) k k ! .More generally, this leads us to deﬁne the scaling factor γ in Section 3.2: γ ( s ) = (cid:89) i ∈ I count s ( i )!where count s ( i ) is the number of times i appears in s , and we take 0! = 1. γ thus repeats theabove scaling process for each interval according to its number of repetitions in h ( o ) , . . . , h ( o m ).It follows that for any s ∈ S (cid:37) , (cid:90) g O ( s ) do = (cid:81) mj =1 ( x i j +1 − x i j ) γ ( s ) . Returning to our original chain of equalities, we get(3) = (cid:88) s ∈ S (cid:37) exp (cid:18) ε u Q · u Q (cid:48) ( X, s ) (cid:19) · (cid:81) mj =1 ( x i j +1 − x i j ) γ ( s )= Z Q (cid:48) . f Q for M Q , by above f Q ( o ) = 1 Z Q (cid:48) · exp (cid:18) ε u Q · u Q ( X, o ) (cid:19) . For any s ∈ S (cid:37) and any o, o (cid:48) ∈ g O ( s ) we have f Q ( o ) = f Q ( o (cid:48) ) and P M Q [ M Q ( X ) ∈ g O ( s )] = 1 Z Q (cid:48) · (cid:90) g O ( s ) exp (cid:18) ε u Q · u Q (cid:48) ( X, ( h ( o ) , . . . , h ( o m ))) (cid:19) do = 1 Z Q (cid:48) · exp (cid:18) ε u Q · u Q (cid:48) ( X, s ) (cid:19) · (cid:81) mj =1 ( x i j +1 − x i j ) γ ( s )= P M Q (cid:48) (cid:2) M Q (cid:48) ( X ) ∈ g O ( s ) (cid:3) .u Q is constant over o ∈ g O ( s ) for any s , so conditioned on selecting a given s = ( i , . . . , i m ) ∈ S (cid:37) , M Q has a uniform output distribution over increasing sequences from g O ( s ), i.e. f Q ( o ) = f Q ( o | o ∈ g O ( s )) · P M Q [ M Q ( X ) ∈ g O ( s )]= (cid:89) j ∈ [ m ] γ ( s ) x i j +1 − x i j · P M Q [ M Q ( X ) ∈ g O ( s )]= (cid:89) j ∈ [ m ] γ ( s ) x i j +1 − x i j · P M Q (cid:48) (cid:2) M Q (cid:48) ( X ) ∈ g O ( s ) (cid:3) = f Q (cid:48) ( o )where the second and fourth equalities use (cid:82) g O ( s ) do = (cid:81) mj =1 ( x ij +1 − x ij ) γ ( s ) . Thus M Q and M Q (cid:48) haveidentical output distributions.We now repeat this process for the eﬃcient sampling improvement from Section 3.3. Theorem 1.

JointExp satisﬁes ε -diﬀerential privacy, takes time O ( mn + m n ) , and uses space O ( mn + m n ) .Proof. We ﬁrst verify that

JointExp samples from M Q (cid:48) , which implies diﬀerential privacy. Sincethe uniform sampling step is unchanged, it suﬃces to show that the distribution over sampledsequences of intervals is correct.Let S (cid:37) ( j, i, k ) = { ( i , . . . , i j ) | i ≤ · · · ≤ i j − k < i j − k +1 = · · · = i j = i } denote the set ofnondecreasing sequences of length j where exactly the last k intervals are equal to i . We will ﬁrstshow that, for all j ∈ [ m ], all i ∈ I , and all k ∈ [ j ], α ( j, i, k ) = (cid:88) s =( i ,...,i j ) s ∈ S (cid:37) ( j,i,k ) γ ( s ) (cid:89) j (cid:48) ∈ [ j ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) . (4)The base case of j = 1 holds by deﬁnition, since we let α (1 , i,

1) = φ (0 , i,

1) and γ ( s ) = 1 when s

18s a sequence of length one. Inductively, assuming equality holds for j (cid:48) < j , we have α ( j, i,

1) = (cid:88) i (cid:48)
1) consists of a sequence in S (cid:37) ( j − , i (cid:48) , k ) for some i (cid:48) < i and k < j with an i appended to the end. Note that the appending of only a single i means that γ ( s ) = γ ( s (cid:48) ).Similarly, α ( j, i, k >

1) = φ ( i, i, j ) · α ( j − , i, k − /k = 1 k φ ( i, i, j ) (cid:88) s (cid:48) =( i ,...,i j − ) s (cid:48) ∈ S (cid:37) ( j − ,i,k − γ ( s (cid:48) ) (cid:89) j (cid:48) ∈ [ j − φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) )= (cid:88) s =( i ,...,i j ) s ∈ S (cid:37) ( j,i,k ) γ ( s ) (cid:89) j (cid:48) ∈ [ j ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) , since every sequence in S (cid:37) ( j, i, k >

1) consists of a sequence in S (cid:37) ( j − , i, k −

1) with an i appendedto the end and, when i appears k − s (cid:48) and s is equal to s (cid:48) with an i appendedto the end, γ ( s ) = kγ ( s (cid:48) ). Thus Eq. 4 holds, and we have the “forward” step: α ( j, i, k ) is the(unnormalized) mass of nondecreasing length- j sequences ending in k repetitions of i .Now consider the backward sampling process in JointExp . In the ﬁrst step, we sample a pair( i, k ) ∝ α ( m, i, k ) φ ( i, n, m + 1)=  (cid:88) s =( i ,...,i m ) s ∈ S (cid:37) ( m,i,k ) γ ( s ) (cid:89) j (cid:48) ∈ [ m ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) )  φ ( i, n, m + 1)= (cid:88) s =( i ,...,i m ) s ∈ S (cid:37) ( m,i,k ) γ ( s ) (cid:89) j (cid:48) ∈ [ m +1] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) ∝ (cid:88) s ∈ S (cid:37) ( m,i,k ) P M Q (cid:48) [ s ] , where the second equality uses the fact that we ﬁx i m +1 = n . Since { S (cid:37) ( m, i, k ) } i,k is a partitionof S (cid:37) (that is, every sequence in S (cid:37) appears in S (cid:37) ( m, i, k ) for exactly one value of the pair ( i, k )),we conclude that ( i, k ) is sampled according to the marginal probability that a sequence sampledfrom P M Q (cid:48) ends in exactly k copies of i . 19ontinuing the backward recursion, if the values of s suﬃx = ( i j +1 , . . . , i m ) have already beensampled, then at the next step we sample a pair( i < i j +1 , k ) ∝ α ( j, i, k ) φ ( i, i j +1 , j + 1)=  (cid:88) s preﬁx =( i ,...,i j ) s preﬁx ∈ S (cid:37) ( j,i,k ) γ ( s preﬁx ) (cid:89) j (cid:48) ∈ [ j ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) )  φ ( i, i j +1 , j + 1) ∝  (cid:88) s preﬁx =( i ,...,i j ) s preﬁx ∈ S (cid:37) ( j,i,k ) γ ( s preﬁx ) (cid:89) j (cid:48) ∈ [ j ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) )  φ ( i, i j +1 , j + 1)  γ ( s suﬃx ) m +1 (cid:89) j (cid:48) = j +2 φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) )  = (cid:88) s preﬁx =( i ,...,i j ) s preﬁx ∈ S (cid:37) ( j,i,k ) γ ( s preﬁx ) γ ( s suﬃx ) (cid:89) j (cid:48) ∈ [ m +1] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) ∝ (cid:88) s preﬁx =( i ,...,i j ) s preﬁx ∈ S (cid:37) ( j,i,k ) P M Q (cid:48) [ s preﬁx + s suﬃx ] , where + denotes sequence concatenation, and we use the fact that, since s preﬁx and s suﬃx arenondecreasing and i j < i j +1 , γ ( s preﬁx + s suﬃx ) = γ ( s preﬁx ) γ ( s suﬃx ). Again, the set of nondecreasingsequences of length j can be partitioned into disjoint subsets { S (cid:37) ( j, i, k ) } i,k , thus the pair ( i, k ) issampled according to the marginal probability that a sequence sampled from P M Q (cid:48) , conditional onhaving the suﬃx s suﬃx , has a j -length preﬁx ending in exactly k copies of i .Inductively, then, JointExp samples a sequence s according to P M Q (cid:48) . It remains to show thatAlgorithm 1 uses O ( mn + m n ) space and time. Space analysis. φ occupies O ( mn ) space, ˆ α occupies O ( mn ) space, and α occupies O ( m n )space. All other variables in the algorithm occupy a constant amount of space. Hence, the overallspace usage is O ( mn + m n ). Time analysis.

The ﬁrst two for-loops in Algorithm 1 compute φ . Each entry of φ can becomputed in constant time, so these loops run in O ( mn ) time. The next loop computes α (1 , i, i ∈ I . This just copies entries of φ into α , so it runs in O ( n ) time. The next computation isthe nested for-loop for ˆ α . The equation for ˆ α simply sums over at most O ( m ) previously-computed α entries, so computing all O ( mn ) entries of ˆ α takes O ( m n ) time. The subsequent nested for-loopcomputes α ( j, i, α ( j, i,

1) simply sums over at most O ( n ) previously-computed φ and ˆ α terms, so computing it for all ( i, j ) pairs takes time O ( mn ). Turning to the nestedfor-loop for α ( i, j, k > α terms can be computed in constanttime, since it is just a product of one previously-computed α term with one previously-computed φ term. Hence, these remaining α take O ( m n ) time to compute. Finally, the “Sample ( i, k )” steptakes O ( mn ) time, and is repeated at most m times in the for-loop. Thus, the overall runtime is O ( mn + m n ). 20 Logarithm Trick

In this section, we give details for a more numerically stable logarithmic version of

JointExp .First, we compute and store ln( φ ) instead of φ . Next, recall that we deﬁned α (1 , i,

1) = φ (0 , i, α (1 , i, k ) = 0 for k >

1. The former becomes ln( α (1 , i, φ (0 , i, α (1 , i, k )) = −∞ , e.g. using -numpy.inf in Python.We now turn to ln( α ( j, · , · )) for j = 2 , . . . , m . Then we setln( ˆ α ( j − , i )) = ln (cid:88) k φ ( i, i, j ) · α ( j − , i, k − /k )The complication is that we must do this using only the ln( φ ) and ln( α ) terms that have alreadybeen computed. To compute ln( ˆ α ( j − , i )), we use the following method for summing a vector ofquantities a given its component-wise logarithmic form ln( a )1. Compute the maximum element in the vector: M a = max(ln( a )).2. Component-wise subtract oﬀ the maximum element and exponentiate: a = exp(ln( a ) − M a ).3. Sum outside of logspace, then return to logspace: c = ln(sum( a )).4. Add back the maximum: return c + M a .An example implementation is scipy.special.logsumexp [23]. In the computation of diﬀerentln( α ( j, i, A and B given their component-wise logarithmicforms ln( A ) and ln( B ) by a similar process:1. Compute the maximum element in each matrix: M A = max(ln( A )) and M B = max(ln( B )).2. Component-wise subtract oﬀ the maximum element and exponentiate: A = exp(ln( A ) − M A )and B = exp(ln( B ) − M B ).3. Matrix multiply outside of logspace, then return to logspace: C = ln( A × B ).4. Add back the maxima: return C + M A + M B .An example implementation can be found on StackOverﬂow [20]. C Sampling by “Racing” Method

The “racing” method is originally due to Ilya Mironov. To the best of our knowledge, full expositionand proofs ﬁrst appeared in the work of Medina and Gillenwater [18]. We recap their expositionhere. The main tool is the following result:

Lemma 11 (Proposition 5 [18]) . Let U , . . . , U N ∼ U (0 , be uniform random samples from [0 , and deﬁne random variable R = arg min [ N ] [ln(ln(1 /U k )) − ln( p k )] . Then P R [ k ] = p k (cid:80) Nj =1 p j . Lemma 11 enables us to sample from distributions that depend on small probabilities p k byinstead using their more tractable logarithms. In combination with the logarithm trick from Ap-pendix B, we can avoid dealing with exponentiated terms entirely.21 Discussion of Other Quantile Algorithms

In this section, we discuss the private quantile estimation algorithms of Dwork and Lei [8] andTzamos, Vlatakis-Gkaragkounis, and Zadik [28] and explain why we do not include them in ourexperiments. Both of these are single quantile algorithms, so, as with the other algorithms that weconsider, they would require m compositions in order to estimate m quantiles.Dwork and Lei [8] deﬁne a “propose-test-release” algorithm. Brieﬂy, it discretizes the spaceinto bins of equal width, then computes how many points in the dataset must change to move thetrue quantile out of its current bin. If this number is too small (speciﬁcally, if it is no larger thanln ( n ) + 2, the “test”), then the algorithm does not produce an answer. Otherwise, the answeris the true quantile plus Laplace noise whose scale is six times the bin width (“release”). We canballpark the accuracy of this method on the uniform data distribution used in our experiments, i.e. n = 1000 samples from U ( − , ( n ) + 2 ≈

50. If we choose a bin width such that the binwith the true median contains 100 points, then it takes at most 50 swaps to move the median outof that bin. We must therefore choose, at a minimum, a bin size such that the bin containing themedian contains at least 100 points. Even if we successfully make this choice, then the resultingoutput will still be far less accurate than that of all the other methods tested in the experiments.This is because a successfull choice requires a bin width ≥

1, so the algorithm releases the truemedian value of ≈ − ,

5] range with probability ≈ . ≤

25 for median estimation on uniform data.We now turn to the private quantile estimation algorithm given by Tzamos et al. [28]. Thisalgorithm is also based on adding (a variant of) Laplace noise to the true median. The ﬁrst drawbackof this method is that its time complexity is O ( n ) (see the footnote accompanying their deﬁnitionof “TypicalHamming”). This makes it impractical for datasets with more than a few hundreddatapoints. The second drawback is the need to select several hyperparameters ( R, r, L, C ) todetermine the speciﬁc Laplace noise distribution. While this hyperparameter selection does notaﬀect the privacy guarantee, it does aﬀect the utility. Their utility guarantees assume that thealgorithm operator knows these distributional parameters a priori, but this assumption may behard to satisfy in practice. In contrast,

JointExp and the other comparison algorithms only requirethe user to provide endpoints.

E Details For Comparison Algorithms

E.1

AppIndExp

In this section we provide details for

AppIndExp . The technique that diﬀerentiates

AppIndExp from

IndExp is the variant of advanced composition for nonadaptive bounded-range mechanisms givenby Dong et al. [7]. For simplicity, we give a less general (but not weaker) version of their result.

Lemma 12 (Theorem 3 [7]) . Let mechanism A consist of m ε -DP applications of the exponentialmechanism. Deﬁne t ∗ (cid:96) = (cid:20) ε g + ( (cid:96) + 1) εm + 1 (cid:21) ε and p t ∗ (cid:96) = e − t ∗ (cid:96) − e − ε − e − ε here [ x ] ε denotes the value of x clipped to interval [0 , ε ] . Then A is ( ε g , δ ) -DP for δ = max ≤ l ≤ m m (cid:88) i =0 (cid:20)(cid:18) mi (cid:19) p m − it ∗ (cid:96) · (1 − p t ∗ (cid:96) ) i · max (cid:16) e mt ∗ (cid:96) − iε − e ε g , (cid:17)(cid:21) . To apply Lemma 12 with a ﬁxed δ , we use it to compute the largest ε , at a granularity of 0.01,that achieves ( ε g , δ )-DP with some δ ≤ − , and we use this value for our experiments. As thisis independent of the actual mechanism in question, the time required for this computation is notincluded in the runtime values reported for AppIndExp . E.2

Smooth and

CSmooth

We now move on to the smooth sensitivity algorithms, starting with the precise statement of the t -smooth sensitivity of computing a quantile: Lemma 13 (Proposition 3.4 [19]) . Let a and b be client-provided left and right data endpoints. Let X be a database of values x ≤ . . . ≤ x n in [ a, b ] , and for notational convenience deﬁne x i = a for i < and x i = b for i > n . Let x j ∗ be the true value for quantile q on X . Then the t -smoothsensitivity of computing q on X is S tq ( X ) = max m =0 ,...,n (cid:18) e − tm · max k =0 ,...,m +1 ( x j ∗ + k − x j ∗ + k − m − ) (cid:19) . Looking at the two max operations, we can compute S tq ( X ) in time O ( n ). Nissim et al. [19]also provide a slightly more involved method for computing S tq ( X ) in time O ( n log( n )). We omitits details here but note that our implementation uses this O ( n log( n )) speedup for the fairest timecomparison.Next, we specify the exact noise distribution used to generate additive noise in Smooth : Lemma 14 (Lemma 2.6, Lemma 2.9 [19]) . Let f be a real-valued function with t -smooth sensitivity S tf ( X ) on database X . Then sampling Z ∼ Lap (1) from the standard Laplace distribution andreleasing f ( X ) + S ε/ [2 ln(2 /δ )] f ( X ) · Zε/ satisﬁes ( ε, δ ) -DP. The analogous result for

CSmooth is similar:

Lemma 15 (Proposition 3 [4]) . Deﬁne the Laplace Log-Normal distribution with shape parameter σ > , LLN ( σ ) , as the distribution for the random variable Z = X · e σY where X ∼ Lap (1) and Y ∼ N (0 , . Let f be a real-valued function and let s, t > . Then releasing f ( X ) + S tf ( X ) · Zs satisﬁes ε -CDP for ε = tσ + e . σ s . Once we ﬁx the desired CDP privacy parameter ε , to apply Lemma 15 we must still select t, s, σ >

0. We follow the selection method given in Sections 3.1.1 and 7.1 of Bun and Steinke [4],omitting most of the details. First, for each of a sequence of values for t , we set s = e − . σ ( ε − t/σ )23nd numerically solve for σ as a root of the polynomial εt σ − σ − t provides a collection of ( t, s, σ ) triples without touching the database X . Given thesetriples ( t, s, σ ), we ﬁnally select one to minimize variance S tf ( X ) e − σ ( ε − t/σ ) .We pause to note that this last minimization of variance repeatedly touches X to compute S tf ( X ) for diﬀerent t and so is not diﬀerentially private. We therefore carried out this non-privateselection process once using data drawn from the standard Gaussian N (0 ,

1) and used the resultingvalues for

CSmooth experiments on our datasets. In practice, after starting from a wide range for t of 150 logarithmically spaced values between 10 − and 10, we found that the values selected for t clustered in a narrow subinterval across both data drawn from N (0 ,

1) and data drawn from ourother experiment distributions. We therefore view the selection of t as contributing relatively littleto the ﬁnal error of CSmooth .In more detail, the actual t selection process in our experiments is to use the variance-minimizingselection process described in Section 5.1 for each quantile in sets of quantiles ranging from m = 1to m = 29 for ε = 0 .

1, and ε = 1. The range for t is 50 logarithmically spaced values between 0.001and 0.1 ( ε = 0 .

1) or 50 logarithmically spaced values between 0.01 and 1 ( ε = 1). Each trial used1000 samples drawn from N (0 ,

1) with data lower bound −

100 and data upper bound 100. Below,we record the t selected for each quantile and quantile range, averaged across 5 trials. Each colorrepresents a diﬀerent set of quantiles, and each point for each color represents the t selected for asingle quantile. We then used these selected quantiles for each (number of quantiles, quantile) pairin our experiments.Figure 5: Tuned t used for CSmooth for ε = 0 . ε = 1 across diﬀerent quantile ranges. Forexample, we used t ≈ .

13 for ε = 1, m = 3, and q = 0 . ε = 1 plot). 24 Additional ε = 0 . Error Plots

Figure 6: ε = 0 .

1. Note thelogarithmic y -axis: for m = 3, JointExp produces 5x fewer errors than any other algorithm andestimates each quantile within on average <<

Related Researches

CaPC Learning: Confidential and Private Collaborative Learning

by Christopher A. Choquette-Choo

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection

by Dara Bahri

SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks

by Bahare Fatemi

On Theory-training Neural Networks to Infer the Solution of Highly Coupled Differential Equations

by M. Torabi Rad

Spherical Message Passing for 3D Graph Networks

by Yi Liu

Target Training Does Adversarial Training Without Adversarial Samples

by Blerta Lindqvist

Domain Invariant Representation Learning with Domain Density Transformations

by A. Tuan Nguyen

Sparsification via Compressed Sensing for Automatic Speech Recognition

by Kai Zhen

Using Deep LSD to build operators in GANs latent space with meaning in real space

by J. Quetzalcoatl Toledo-Marin

Consensus Based Multi-Layer Perceptrons for Edge Computing

by Haimonti Dutta

RL for Latent MDPs: Regret Guarantees and a Lower Bound

by Jeongyeol Kwon

Scheduling the NASA Deep Space Network with Deep Reinforcement Learning

by Edwin Goh

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

by Guangyu Shen

On Explainability of Graph Neural Networks via Subgraph Explorations

by Hao Yuan

More Is More -- Narrowing the Generalization Gap by Adding Classification Heads

by Roee Cates

Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19

by Logan Ward

Estimation and Applications of Quantiles in Deep Binary Classification

by Anuj Tambwekar

Emotion Transfer Using Vector-Valued Infinite Task Learning

by Alex Lambert

Measuring Progress in Deep Reinforcement Learning Sample Efficiency

by Florian E. Dorner

Bounded Memory Active Learning through Enriched Queries

by Max Hopkins

Adversarially Trained Models with Test-Time Covariate Shift Adaptation

by Jay Nandy

Transfer learning based few-shot classification using optimal transport mapping from preprocessed latent space of backbone neural network

by Tomáš Chobola

Regularization Strategies for Quantile Regression

by Taman Narayan

Large-Scale Training System for 100-Million Classification at Alibaba

by Liuyihan Song

Classifier Calibration: with implications to threat scores in cybersecurity

by Waleed A. Yousef

«

1

2

3

4

»

Submitted on 16 Feb 2021 (v1), last revised 15 Jun 2021 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar