Differentially Private Quantiles
DDifferentially Private Quantiles
Jennifer Gillenwater ∗ Matthew Joseph † Alex Kulesza ‡ February 17, 2021
Abstract
Quantiles are often used for summarizing and understanding data. If that data is sensitive,it may be necessary to compute quantiles in a way that is differentially private, providingtheoretical guarantees that the result does not reveal private information. However, in thecommon case where multiple quantiles are needed, existing differentially private algorithmsscale poorly: they compute each quantile individually, splitting their privacy budget and thusdecreasing accuracy. In this work we propose an instance of the exponential mechanism thatsimultaneously estimates m quantiles from n data points while guaranteeing differential privacy.The utility function is carefully structured to allow for an efficient implementation that avoidsexponential dependence on m and returns estimates of all m quantiles in time O ( mn + m n ).Experiments show that our method significantly outperforms the current state of the art onboth real and synthetic data while remaining efficient enough to be practical. Quantiles are a widespread method for understanding real-world data, with example applicationsranging from income [24] to birth weight [6] to standardized test scores [12]. At the same time,the individuals contributing data may require that these quantiles not reveal too much informationabout individual contributions. As a toy example, suppose that an individual joins a company thathas exactly two salaries, and half of current employees have one salary and half have another. Inthis case, publishing the exact median company salary will reveal the new employee’s salary.
Differential privacy [11] offers a solution to this problem. Informally, the distribution over adifferentially private algorithm’s outputs must be relatively insensitive to the input of any singledata contributor. Returning to the salary example, a differentially private method for computingthe median company salary would have similar-looking output distributions regardless of whichsalary the new employee receives. The resulting uncertainty about any single contributor’s datamakes the algorithm “private”.In this work, we study differentially private estimation of user-specified quantiles q , . . . , q m ∈ [0 ,
1] for a one-dimensional dataset X of size n . The output quantile estimates consist of m values,which we denote o , . . . , o m . Ideally, the o j are as close to the dataset’s actual quantiles as possible.For example, if q j = 0 .
5, then our goal is to have o j be as close to the median of X as possible.Several algorithms for differentially private quantiles exist [19, 8, 26, 28]. These algorithmsfocus on estimating a single quantile. In one sense, this is without loss of generality because ofdifferential privacy’s composition guarantee. Basic composition says that, if we estimate each of m ∗ Google New York, [email protected] † Google New York, [email protected] ‡ Google New York, [email protected] a r X i v : . [ c s . L G ] F e b uantiles via an εm -differentially private algorithm, then we will obtain ε -differential privacy overallfor the set of m quantiles (see Section 2 for details). However, the cost of this generality is thesmaller and more restrictive privacy budget εm (or roughly ε √ m for “advanced” composition [15] andother variants tailored to the exponential mechanism [7]). As a result, estimating m quantiles viarepeated application of a single quantile algorithm yields significantly more inaccurate outcomesas m grows. This is unfortunate, as many applications rely on multiple quantiles: returning tothe opening paragraph, the income statistics use m = 4 (quintiles), the birth weight statistics use m = 9 (deciles), and the test score statistics use m >
1. We give an instantiation of the exponential mechanism [17],
JointExp , that produces an ε -differentially private collection of m quantile estimates in a single invocation (Section 3.1).This mechanism uses a utility function that has a fixed sensitivity of 2 no matter how manyquantiles are requested, and does not need to divide ε based on the number of quantiles.2. We provide a dynamic program, related to algorithms used for inference in graphical models,to implement JointExp in time O ( mn + m n ) (Section 3.2, Section 3.3). This significantlyimproves on naive sampling, which requires time O ( n m ).3. We experimentally evaluate JointExp against current algorithms on synthetic and real datasets.We find that
JointExp is efficient enough to be practical for moderate dataset sizes and obtainsmuch better accuracy than the existing state-of-the-art (Section 5).
Several algorithms for estimating a single quantile have appeared in the differential privacy lit-erature [19, 8, 26, 28]. Details for previous approaches appear in our experiments (Section 5.1).The main difference is that our algorithm focuses on estimating multiple quantiles at once and hassensitivity independent of the number of quantiles m . Hence, as m grows, our algorithm does notneed to split ε into ever-smaller pieces as other algorithms do.As noted in the list of contributions above, our algorithm is an instantiation of the exponentialmechanism. Blocki, Datta, and Bonneau [2] studied how to release the counts (but not identities)of items in a dataset by constructing a relaxation of the exponential mechanism and samplingfrom it using dynamic programming. However, it is not clear how this method can be applied toquantiles. In particular, our utility function requires accounting for interactions between indicesof the sampled vector, while their utility function neatly decomposes into individual terms. Blockiet al. [2] closed their paper by asking if dynamic programming could produce efficient applicationsof the exponential mechanism in other settings; our work is in part an affirmative answer to thisquestion.Finally, our dynamic program for sampling from JointExp ’s exponential mechanism is relatedto inference algorithms for graphical models. Several papers have studied differential privacy withgraphical models. However, this has typically meant studying private versions of graphical modelingtasks [30, 1] or using graphical models as a step in private algorithms [16]. Our paper departs fromthat past work in that its dynamic program, while related to e.g. the forward-backward algorithm,does not have any conceptual dependence on graphical models themselves.2
Preliminaries
We view databases
X, X (cid:48) as multisets of elements from some data domain X where each individualcontributes at most one element to the database. To reason about databases that are “close”,differential privacy uses neighbors . Definition 1.
Databases X and X (cid:48) ∈ X n are neighbors , denoted X ∼ X (cid:48) , if they differ in at mostone element. Note that we use the swap definition of differential privacy; in contrast, the add-remove defi-nition allows the addition or removal (rather than exchange) of one element between neighboringdatabases. We do this for consistent evaluation against the smooth-sensitivity framework (seeSection 5.1.2), which also uses swap differential privacy. However, we emphasize that our algo-rithm
JointExp easily adapts to the add-remove framework (in fact, its sensitivity is lower underadd-remove privacy).With the notion of neighboring databases in hand, we can now define differential privacy.
Definition 2 (Dwork, McSherry, Nissim, and Smith [11]) . A randomized algorithm A : X ∗ → Y is ( ε, δ ) -differentially private if, for every pair of neighboring databases X, X (cid:48) and every output subset Y ⊂ Y , P A [ A ( X ) ∈ Y ] ≤ e ε P A (cid:2) A ( X (cid:48) ) ∈ Y (cid:3) + δ. When δ > , we say A satisfies approximate differential privacy. If δ = 0 , we say A satisfies pure differential privacy, and shorthand this as ε -differential privacy (or ε -DP). Our algorithm
JointExp will be ε -DP.A key benefit of differential privacy is composability: an algorithm that relies on differentiallyprivate subroutines inherits an overall privacy guarantee by simply adding up the privacy guaranteesof its components. Lemma 1 (Dwork et al. [11]) . Let A , . . . , A k be k algorithms that respectively satisfy ( ε , δ ) - , . . . , ( ε k , δ k ) -differential privacy. Then running A , . . . , A k satisfies (cid:16)(cid:80) ki =1 ε i , (cid:80) ki =1 δ i (cid:17) -differentialprivacy. We will use composability (or its “advanced” variants) when evaluating baseline methods thatestimate a set of m quantiles by estimating each quantile individually. By Lemma 1, to achieveoverall ε -DP, it suffices to estimate each quantile under εm -DP. However, since our algorithm Join-tExp estimates all quantiles in one invocation, it does not use composition.We will also rely on the exponential mechanism, a common building block for differentiallyprivate algorithms.
Definition 3 (McSherry and Talwar [17], Dwork and Roth [9]) . Given utility function u : X ∗ × O → R mapping ( database , output ) pairs to real-valued scores with L sensitivity ∆ u = max X ∼ X (cid:48) ,o ∈ O | u ( X, o ) − u ( X (cid:48) , o ) | , the exponential mechanism M has output distribution P M [ M ( X ) = o ] ∝ exp (cid:18) εu ( X, o )2∆ u (cid:19) , where ∝ elides the normalization factor. Lemma 2 (McSherry and Talwar [17]) . The mechanism described in Definition 3 is ε -DP. The above material suffices to understand the bulk of our algorithm,
JointExp . The algorithmsused for our experimental comparisons will also require some understanding of smooth sensitivityand concentrated differential privacy, but since these concepts will be relevant only as points ofexperimental comparison, we discuss them in Section 5. JointExp
This section provides an exposition of our quantiles algorithm,
JointExp . Recall that our goal is totake as input quantiles q < q < . . . < q m ∈ [0 ,
1] and database X and output quantile estimates o , . . . , o m such that, for each j ∈ [ m ], P x ∼ U X [ x ≤ o j ] ≈ q j .In Section 3.1, we start with an instance of the exponential mechanism whose continuous outputspace makes sampling impractical. In Section 3.2, we construct a mechanism with the same outputdistribution (and, importantly, the same privacy guarantees) and a bounded but inefficient samplingprocedure. Finally, in Section 3.3 we modify our sampling procedure once more to produce anequivalent and polynomial time method, which we call JointExp . We start by formulating an instance of the exponential mechanism for our quantiles setting. First,we will require the algorithm user to input a lower bound a and upper bound b for the data domain. We assume that all x ∈ X are in [ a, b ]; if this is not the case initially, then we clamp any outsidepoints to [ a, b ]. The output space is O (cid:37) = { ( o , . . . , o m ) | a ≤ o ≤ · · · ≤ o m ≤ b } , the set ofsequences of m nondecreasing values from [ a, b ]. For a given o = ( o , . . . , o m ), the utility functionwill compare the number of points in each proposed quantile interval [ o j − , o j ) to the expectednumber of points in the correct quantile interval. We denote the number of data points betweenadjacent quantiles q j − and q j by n j = ( q j − q j − ) n . We fix q = 0 and q m +1 = 1, so that n = q n and n m +1 = (1 − q m ) n . We also denote the number of data points from X between any two values u and v by n ( u, v ) = |{ x ∈ X | u ≤ x < v }| . We can now define our utility function u Q ( X, o ) = − (cid:88) j ∈ [ m +1] | n ( o j − , o j ) − n j | , where we fix o = a and o m +1 = b + 1 (setting o m +1 to a value larger than b ensures that pointsequal to b are counted in the final term of the sum). u Q thus places the highest probability on thetrue quantile values and lower probability on estimates that are far from the true quantile values. Lemma 3. u Q has L sensitivity ∆ u Q = 2 .Proof. Fix an output o . Let X and X (cid:48) be neighboring databases. Since we use swap differentialprivacy, | X | = | X (cid:48) | , so u Q ( X, o ) and u Q ( X (cid:48) , o ) only differ in their n ( · , · ) (respectively denoted n (cid:48) ( · , · )for X (cid:48) ). Since X and X (cid:48) are neighbors, there are at most two intervals ( o j − , o j ) and ( o j (cid:48) − , o j (cid:48) )on which n and n (cid:48) differ, each by at most one. Thus | u Q ( X, o ) − u Q ( X (cid:48) , o ) | ≤ Lower and upper bounds are also necessary for the private quantile algorithms that we compare to in ourexperiments. We find that choosing loose bounds a and b does not greatly affect utility (see experiments in Section 5). u Q = 2[1 − min j ∈ [ m +1] ( q j − q j − )].A full proof of this and other results appears in Appendix A.The corresponding mechanism M Q has output density f Q ( o ) ∝ · exp (cid:18) ε u Q · u Q ( X, o ) (cid:19) . (1)Since this is an instantiation of the exponential mechanism, we can apply Lemma 2 to get: Lemma 4.
The mechanism M Q defined by output density f Q satisfies ε -differential privacy. However, as is typically a drawback of the exponential mechanism, it is not clear how to effi-ciently sample from this distribution, which is defined over a continuous m -dimensional space. Thefollowing section address this issue. Since the output distribution itself remains fixed through thesesampling procedure changes, these improvements will preserve the privacy guarantee of Lemma 4.The remaining proofs will therefore focus on verifying that subsequent sampling procedures stillsample according to Eq. 1. In this section, we describe how to sample from the continuous distribution defined by M Q by firstsampling from an intermediate discrete distribution. This is similar to the single-quantile samplingtechnique given by Smith [26] (see their Algorithm 2). The basic idea is that we split the samplingprocess into three steps:1. Sample m intervals from the set of intervals between data points.2. Take a uniform random sample from each of the m sampled intervals.3. Output the samples in increasing order.This will require some notation setup. Denote the elements of X in nondecreasing order by x ≤ · · · ≤ x n , fix x = a and x n +1 = b , and let I = { , . . . , n } , where we associate i ∈ I withthe interval between points x i and x i +1 . Define S (cid:37) to be the set of nondecreasing sequences of m intervals, S (cid:37) = { ( i , . . . , i m ) | i , . . . , i m ∈ I, i ≤ · · · ≤ i m } .S (cid:37) will be the discrete output space for the first sampling step above. We can define a utilityfunction u Q (cid:48) on s = ( i , . . . , i m ) ∈ S (cid:37) by slightly modifying u Q : u Q (cid:48) ( X, s ) = − (cid:88) j ∈ [ m +1] | ( i j − i j − ) − n j | , where we fix i = 0 and i m +1 = n .In order to reproduce M Q from Section 3.1, our sequence sampler will also need to weight eachsequence s by the total measure of the outputs o ∈ O (cid:37) that can be sampled from s in the secondstep. This is nontrivial due to the ordering constraint on o : if an interval appears twice in s , themeasure of corresponding outputs must be halved to account for the fact that the two correspondingsamples in the second step can appear in either order, but will be mapped to a fixed increasingorder in the third step. In general, if an interval appears k times, the measure must be scaled by a5actor of 1 /k !, the volume of the standard k -simplex. We account for this by dividing by the scalefunction γ ( s ) = (cid:89) i ∈ I count s ( i )! , where count s ( i ) is the number of times i appears in s and we take 0! = 1.The mechanism M Q (cid:48) is now defined as follows:1. Draw s = ( i , . . . , i m ) according to P M Q (cid:48) [ s ] ∝ exp (cid:18) ε · u Q (cid:48) ( X, s )2∆ u Q (cid:19) · (cid:81) mj =1 ( x i j +1 − x i j ) γ ( s ) .
2. For j ∈ [ m ], draw o j uniformly at random from [ x i j , x i j +1 ).3. Output o , . . . , o m in increasing order.It remains to verify that M Q (cid:48) actually matches M Q . Lemma 5. M Q (cid:48) has the same output distribution as M Q .Proof Sketch (see Appendix A for full proof ). Given potential outputs o and o (cid:48) , if the correspond-ing quantile estimates fall into the same intervals between data points in X , then the counts n ( · , · )are unchanged and u Q ( X, o ) = u Q ( X, o (cid:48) ). Since u Q is constant over intervals between data points,it is equivalent to sample those intervals and then sample points uniformly at random from thechosen intervals. The only complication is accounting for the γ scaling introduced by repeatedintervals.The benefit of M Q (cid:48) over M Q is that the first step samples from a finite space, and the secondsampling step is simply uniform sampling. However, the size of the space for the first step is still O ( n m ), which remains impractical for all but the smallest datasets. In the next section, we developa dynamic programming algorithm that allows us to sample from P M Q (cid:48) in time O ( mn + m n ). Notice that the bulk of our probability distribution over sequences s = ( i , . . . , i m ) can be decom-posed as a product of scores, where each score depends only on adjacent intervals i j − and i j . Inparticular, P M Q (cid:48) [ s ] ∝ γ ( s ) (cid:89) j ∈ [ m +1] φ ( i j − , i j , j ) , with φ defined for i ≤ i (cid:48) as follows. For j ∈ [ m ]: φ ( i, i (cid:48) , j ) = exp (cid:18) − ε u Q | ( i (cid:48) − i ) − n j | (cid:19) ( x i (cid:48) +1 − x i (cid:48) )For j = m + 1: φ ( i, n, m + 1) = exp (cid:18) − ε u Q | ( n − i ) − n m +1 | (cid:19) . Fig. 1 illustrates this structure graphically, suggesting a dynamic programming algorithm similarto the “forward-backward” algorithm from the graphical models literature (see e.g. Chapter 15 ofRussell and Norvig [22]). 6 i φ ( i , i , i i m φ ( i , i , i = 0 φ ( i m , i m +1 , m + 1) φ ( i , i , i m +1 = n Figure 1: Illustration of pairwise dependencies for interval sequence s = ( i , . . . , i m ).Unfortunately, γ ( s ) does not factor in the same way. However, it has its own special structure:since s is required to be nondecreasing, γ ( s ) decomposes over contiguous constant subsequences of s . We will use this to design an efficient dynamic programming algorithm for sampling P M Q (cid:48) .Define the function α : [ m ] × I × [ m ] → R so that α ( j, i, k ) is the total unnormalized probabilitymass for prefix sequences of length j that end with exactly k copies of the interval i . For all i ∈ I ,let α (1 , i,
1) = φ (0 , i,
1) and α (1 , i, k ) = 0 for k >
1. Now, for j = 2 , . . . , m , we have the followingrecursion for all i ∈ I : α ( j, i,
1) = (cid:88) i (cid:48)
1) = φ ( i, i, j ) · α ( j − , i, k − /k Intuitively, if the sequence ends with a single i , we need to sum over all possible preceding intervals i (cid:48) , which could have been repeated up to k < j times. On the other hand, if the sequence endswith more than one i , we know that the preceding interval was also i , and we simply divide by k to account for the scale function γ .We can now sample in the reverse direction as follows. First, draw a pair( i, k ) ∝ α ( m, i, k ) φ ( i, n, m + 1)(the φ term accounts for the final edge in the graph; see Appendix A for details). This determinesthat the last k sampled intervals are equal to i . We can then draw another pair( i (cid:48) < i, k (cid:48) ) ∝ α ( m − k, i (cid:48) , k (cid:48) ) φ ( i (cid:48) , i, m − k + 1) , which determines that the last k (cid:48) remaining intervals in the sequence are i (cid:48) , and so on until we havea complete sample.This dynamic program can be implemented to run in time O ( mn + m n ). Algorithm 1 providespseudocode for the overall sampling process, including computational optimizations, and Theorem 1verifies that Algorithm 1 samples efficiently from the correct distribution. The full proof appearsin Appendix A, but we note here that the privacy proof mainly involves unrolling the productsand sums to show that the algorithm correctly implements M Q (cid:48) , and the time and space proofs arestraightforward from the pseudocode. Theorem 1.
JointExp satisfies ε -differential privacy, takes time O ( mn + m n ) , and uses space O ( mn + m n ) . Numerical improvements.
Note that the quantities involved in computing φ and α maybe quite small, so we implement JointExp using logarithmic quantities to avoid underflow errorsin our experiments. This is a common trick and is a numerical rather than algorithmic change,but for completeness we include its details in Appendix B. After computing these quantities, toavoid underflow in our final sampling steps, we use a “racing” sampling method that was previouslydeveloped for single-quantile exponential mechanisms. Since this is again a numerical improvement,details appear in Appendix C. 7 lgorithm 1
Pseudocode for
JointExp
Input: sorted X = ( x ≤ . . . ≤ x n ) clamped to data range [ a, b ], quantiles q , . . . , q m , privacyparameter ε Set x = a , x n +1 = b , and ∆ u Q = 2Set I = { , . . . , n } , i = 0, and i m +1 = n for i ∈ I , i (cid:48) ≥ i , j ∈ [ m ] do Set φ ( i, i (cid:48) , j ) = exp (cid:16) − ε ·| ( i (cid:48) − i ) − n j | uQ (cid:17) · ( x i (cid:48) +1 − x i (cid:48) ) end forfor i ∈ I do Set φ ( i, n, m + 1) = exp (cid:16) − ε ·| ( n − i ) − n m +1 | uQ (cid:17) end forfor i ∈ I do Set α (1 , i,
1) = φ (0 , i, end forfor j = 2 , . . . , m dofor i ∈ I do Set ˆ α ( j − , i ) = (cid:80) k 1) = (cid:80) i (cid:48) do Sample ( i < i j +1 , k ) ∝ α ( j, i, k ) φ ( i, i j +1 , j + 1)Set i j − k +1 , . . . , i j = i , and j = j − k end while Output uniform samples { o j ∼ U [ x i j , x i j +1 ) } mj =1 in increasing order Connection to graphical models. As mentioned above, the dynamic program in Algo-rithm 1 is similar to the forward-backward algorithm from the graphical models literature, moduloaccounting for γ ( s ). In graphical models, it is frequently necessary to compute the probabilityof a sequence of hidden states. This requires normalizing by a sum of probabilities of sequences,and, naively, this sum has an exponential number of terms, like Z Q (cid:48) . However, when probabilitiesdecompose into products of score functions of adjacent states, the forward-backward algorithmmakes the process efficient. The extra γ ( s ) term makes our sampling process more complex in away that is similar to semi-Markov models [31]. In graphical model terms, γ can be thought of asa prior that discourages repeats: p prior ( s ) ∝ /γ ( s ). This prior can also be written as a product of n Poisson distributions, each with parameter λ = 1.8 Accuracy Intuition JointExp applies the exponential mechanism once to output m quantiles. The closest competitoralgorithms also apply the exponential mechanism but use m invocations to produce m quantiles. Tobuild intuition for why the former approach achieves better utility, we recall the standard accuracyguarantee for the exponential mechanism: Lemma 6 (McSherry and Talwar [17]) . Let M be an ε -DP instance of the exponential mecha-nism having score function u with sensitivity ∆ u and output space Y . Then for database X , withprobability at least − β , M produces output y such that u ( X, y ) ≥ max y ∗ ∈Y u ( X, y ∗ ) − u log( |Y| /β ) ε . For simplicity, suppose we have uniform data where all interval widths x i +1 − x i are identical.As shown by the experiments in the next section, this is not necessary for JointExp to obtain goodutility, but we assume it for easier intuition. Then (modulo the minor term γ ( s ) that accountsfor rare repeated intervals in the output), P M Q (cid:48) [ s ] ∝ exp (cid:16) ε · u Q (cid:48) ( X,s )2∆ uQ (cid:17) . This means that JointExp ’sprocess of sampling intervals draws from a distribution whose shape is identical to an exponentialmechanism with utility function u Q (cid:48) , but mismatched sensitivity term ∆ u Q = 2. Since the proof ofLemma 6 does not rely on the utility function matching the sensitivity term, we can still apply itto determine the accuracy of this interval sampling procedure. The output space Y for JointExp ’sinterval-sampling has size | S (cid:37) | ≤ n m , so we expect to sample intervals yielding quantiles that intotal misclassify O ( m log( n ) /ε ) points.In contrast, m invocations of a single-quantile exponential mechanism requires each invocationto satisfy ε i = ε/m -DP (standard composition) or roughly ε i = ε/ √ m -DP (advanced composition).Because each invocation uses an output space of size O ( n ), the total error guarantee via Lemma 6scales like O ( m log( n ) /ε i ). Since m/ε i = ω ( m ) /ε for even the best known composition bounds forthe exponential mechanism [7], these approaches incur error with a superlinear dependence on m .This contrasts with JointExp ’s error, which has only a linear dependence on m . We now empirically evaluate our algorithm JointExp against four alternative algorithms: Ind-Exp [26], AppIndExp , Smooth [19], and CSmooth [19, 4]. Explanations of these algorithms andthe reasons for comparing to them appear in Section 5.1. Dataset descriptions, accuracy experi-ments, and time experiments respectively appear in Section 5.2, Section 5.3, and Section 5.4. Allexperiment code may be found on Github [13]. IndExp and AppIndExp Our first comparison algorithm IndExp uses independent applications of the exponential mecha-nism. Smith [26] introduced the basic IndExp algorithm for estimating one quantile, and it hassince been incorporated into the SmartNoise [25] and IBM [14] differential privacy libraries. Ind-Exp thus gives us a meaningful baseline for a real-world approach. IndExp uses the exponential We discuss omitted private quantiles algorithms in Appendix D. Omitted algorithms rely on additional knowledgeabout the data, have large error, and/or are extremely inefficient. q via the utility function u ( X, o ) = ||{ x ∈ X | x ≤ o }|− qn | . Lemma 7. u defined above has L sensitivity ∆ u = 1 .Proof. Consider swapping x ∈ X for x (cid:48) , and fix some o . If x, x (cid:48) ≤ o or x, x (cid:48) > o , then u ( X, o ) = u ( X (cid:48) , o ). If exactly one of x or x (cid:48) is ≤ o , then | u ( X, o ) − u ( X (cid:48) , o ) | = 1. IndExp takes user-provided data bounds a and b and runs on X + = X ∪ { a, b } . After sorting X + into intervals of adjacent data points I , . . . , I n , IndExp selects an interval I j with probabilityproportional to score( X, I j ) = exp( − ε | j − qn | / · | I j | and randomly samples the final quantile estimate from I j .To estimate m quantiles with IndExp , we call it m times with privacy parameter ε/m and outputthe resulting quantile estimates in increasing order. These m invocations are a straightforwardapplication of differential privacy’s basic composition guarantee (Lemma 1).We also consider AppIndExp , an ( ε, δ )-DP variant of IndExp . The only difference between Ap-pIndExp and IndExp is that AppIndExp uses the exponential mechanism’s nonadaptive advancedcomposition guarantee given by Dong, Durfee, and Rogers [7]. This enables AppIndExp to spendmore privacy budget estimating each of the m quantiles. Details appear in Appendix E.1, but wenote here that, to the best of our knowledge, this is the tightest known composition analysis for theexponential mechanism. Since our experiments use n = 1000 data points, we always use δ = 10 − in accordance with the common recommendation that δ (cid:28) n (see e.g. the discussions around thedefinition of differential privacy from Dwork and Roth [9] and Vadhan [29]). Smooth The third comparison algorithm is Smooth , which relies on the smooth sensitivity framework intro-duced by Nissim, Raskhodnikova, and Smith [19]. The basic idea of this framework is to circumventglobal sensitivity by instead using a smooth analogue of local sensitivity. This is useful for problemswhere the global sensitivity is large only for “bad” datasets. Definition 4. For function f : X n → R , the local sensitivity ∆ f ( X ) of f for dataset X is max X (cid:48) | X ∼ X (cid:48) | f ( X ) − f ( X (cid:48) ) | . Recall that global sensitivity is defined over all possible pairs of datasets. In contrast, localsensitivity is also parameterized by a fixed dataset X and defined only over neighbors of X . Itis therefore possible that ∆ f ( X ) (cid:28) ∆ f . For example, if M is the median function and we set X = [ − , M ( {− , , } ) = 1 while ∆ M ( {− , , } ) = 100. However, this alsoshows that local sensitivity itself reveals information about the dataset. The insight of Nissim et al.[19] is that it is possible to achieve differential privacy and take advantage of lower local sensitivityby adding noise calibrated to a “smooth” approximation of ∆ f ( X ). Definition 5 (Nissim et al. [19]) . For t > , the t -smooth sensitivity of f on database X of n points is S tf ( X ) = max X (cid:48) ∈X n e − t · d ( X,X (cid:48) ) · ∆ f ( X (cid:48) ) . For Smooth , we use the Laplace noise variant of smooth sensitivity given by Nissim et al.[19]. This only satisfies ( ε, δ )-DP, but it offers the lightest (and thus most competitive) noise forcomputing a one-dimensional statistic. Details for computing the median’s smooth sensitivity andthe precise noise added appear in Appendix E.2.10 .1.3 CSmooth The fourth and final comparison algorithm is CSmooth . CSmooth has not explicitly appeared inthe literature, but it is a straightforward combination of the smooth sensitivity framework and concentrated differential privacy (CDP) [10, 3]. CDP is a variant of differential privacy that offerscomparable privacy guarantees with often tighter privacy analyses. Bun and Steinke [4] showed howto combine CDP with the smooth sensitivity framework. Since the relevant CDP noise distributionscan have even lighter tails than the Laplace noise used by Smooth , we include CSmooth as aplausibly stronger comparison. Our experiments use the Laplace Log-Normal noise distribution,which achieved the strongest accuracy results in the experiments of Bun and Steinke [4].One complication of CSmooth is the need to select several parameters to specify the noise dis-tribution. We attempted to select these parameters to give CSmooth the strongest utility possible.Details for this process (and CSmooth in general) appear in Appendix E.2.To compare the JointExp ’s pure DP guarantee to CSmooth ’s CDP guarantee, we use the fol-lowing lemma: Lemma 8 (Proposition 1.4 [3]) . If an algorithm is ε -DP, then it is also ε -CDP. We thus evaluate our ε -DP algorithm JointExp against an ε -CDP CSmooth . This comparisonfavors CSmooth : recalling our requirement that approximate DP algorithms have δ ≤ − , thebest known generic conversion from CDP to approximate DP only says that a -CDP algorithmis ( ε, − )-DP for ε ≥ . 76 (Proposition 1.3, [3]). A more detailed discussion of DP and CDPappears in Section 4 of the work of Canonne, Kamath, and Steinke [5].As with IndExp , to estimate m quantiles with CSmooth , we call it m times with an appropriatelyreduced privacy parameter. This time, we use CDP’s composition guarantee: Lemma 9 (Proposition 1.7 [3]) . The composition of k ρ -CDP algorithms is kρ -CDP. From Lemma 8 the overall desired privacy guarantee is ε -CDP, so we use ε (cid:48) = ε √ m in each call. We evaluate all five methods JointExp , IndExp , AppIndExp , Smooth , and CSmooth on four types ofdata: synthetic uniform data from U ( − , N (0 , Our error metric is the number of “missed points”: for each desired quantile q j , we take the truequantile estimate o j and the private estimate ˆ o j , compute the number of data points between o j o j , and sum these counts across all m quantiles. For each dataset, we compare the number ofmissed points for all five algorithms as m grows.In each case, the requested quantiles are evenly spaced. m = 1 is median estimation, m = 2requires estimating the 33rd and 66th percentiles, and so on. We average scores across 20 trialsof 1000 random samples. For every experiment, we take [ − , − , ε = 1 appear in Figure 3.Figure 3: Error vs. ε = 1. Note the logarithmic y -axis:at m = 15, JointExp misclassifies 10x fewer points than any of its competitors.Across all four datasets, a clear effect appears: for a wide range of m , JointExp dominates allother algorithms. First, JointExp beats or matches its competitors even for m = 1. At m = 6 JointExp makes 2x fewer errors than its closest competitor, and at m = 15 the gap grows to 10x. JointExp ’s absolute accuracy is also strong: when estimating ≤ 20 quantiles, JointExp on averageestimates each quantile with ≤ 10 misclassified data points. For comparison, the other algorithmscannot maintain this average misclassification rate past m ≤ JointExp obtains the best privacy in addition to the best accuracy. This is because JointExp is ε -DP while AppIndExp and Smooth are ( ε, δ )-DP, and CSmooth is ε -CDP. The onlyother ε -DP algorithm, IndExp , is 10x worse than JointExp at m = 9 and 20x worse at m = 19.We also evaluate a high-privacy regime where we use ε = 0 . ε = 1. We defer theseplots to Appendix F but note here that a similar (but smaller) separation holds. This is becausethe smaller privacy budget leads all methods to converge to trivial error at m ≈ .4 Time Experiments Figure 4: Time vs m , ε = 1, aver-aged across 20 trials of 1,000 samples.We conclude by evaluating the methods by run-time. The number of data points and quan-tiles are the main determinants of time, so weonly include time experiments using Gaussiandata. All experiments were run on a machinewith two CPU cores and 100GB RAM. As seenin Fig. 4, JointExp is the slowest method, fol-lowed by Smooth and CSmooth , and finally Ind-Exp and AppIndExp . This is a relative weaknessfor JointExp . However, JointExp ’s time peaks at < . m = 29, which is well withinpractical ranges. In this work we constructed a low-sensitivity exponential mechanism for differentially private quan-tile estimation and designed a dynamic program to sample from it efficiently. The result is a prac-tical algorithm that achieves much better accuracy than existing methods. A possible direction forfuture work is exploring other applications of the exponential mechanism where the utility functionis low sensitivity and can be decomposed into “local” score functions, as in the pairwise intervalterms of φ . More precisely, by analogy to the graphical models techniques generally known as be-lief propagation [21], any utility function whose outputs have a chain or tree dependency structureshould be tractable to sample. We thank Thomas Steinke for helpful discussion of concentrated differential privacy. We also thankAndr´es Mu˜noz Medina for helpful comments on an early draft of this paper.13 eferences [1] Garrett Bernstein, Ryan McKenna, Tao Sun, Daniel Sheldon, Michael Hay, and Gerome Mik-lau. Differentially private learning of undirected graphical models using collective graphicalmodels. In International Conference on Machine Learning (ICML) , 2017.[2] Jeremiah Blocki, Anupam Datta, and Joseph Bonneau. Differentially private password fre-quency lists. In Network and Distributed System Security (NDSS) , 2016.[3] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions,and lower bounds. In Theory of Cryptography Conference (TCC) , 2016.[4] Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensi-tivity and mean estimation. In Neural Information Processing Systems (NeurIPS) , 2019.[5] Cl´ement Canonne, Gautam Kamath, and Thomas Steinke. The discrete gaussian for differentialprivacy. In Neural Information Processing Systems (NeurIPS) , 2020.[6] CDC. Data table of infant weight-for-age charts. , 2001. Accessed: 2021-01-02.[7] Jinshuo Dong, David Durfee, and Ryan Rogers. Optimal differential privacy composition forexponential mechanisms and the cost of adaptivity. In International Conference on MachineLearning (ICML) , 2020.[8] Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Symposium on theTheory of Computing (STOC) , 2009.[9] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Founda-tions and Trends ® in Theoretical Computer Science , 2014.[10] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprintarXiv:1603.01887 , 2016.[11] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sen-sitivity in private data analysis. In Theory of Cryptography Conference (TCC) , 2006.[12] ETS. Gre guide to the use of scores. ,2020. Accessed: 2021-01-29.[13] Google. Differentially private quantiles code. https://github.com/google-research/google-research/tree/master/dp_multiq , 2021. Accessed: 2021-02-17.[14] IBM. Ibm differential privacy library. https://github.com/IBM/differential-privacy-library/blob/main/diffprivlib/tools/quantiles.py , 2019.Accessed: 2021-01-05.[15] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differentialprivacy. In International Conference on Machine Learning (ICML) , 2015.[16] Ryan Mckenna, Daniel Sheldon, and Gerome Miklau. Graphical-model based estimation andinference for differential privacy. In International Conference on Machine Learning (ICML) ,2019. 1417] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Foundationsof Computer Science (FOCS) , 2007.[18] Andr´es Mu˜noz Medina and Jenny Gillenwater. Duff: A dataset-distance-based utility functionfamily for the exponential mechanism. arXiv preprint arXiv:2010.04235 , 2020.[19] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling inprivate data analysis. In Symposium on the Theory of Computing (STOC) , 2007.[20] Stack Overflow. numerically stable way to multiply log probability ma-trices in numpy. https://stackoverflow.com/questions/23630277/numerically-stable-way-to-multiply-log-probability-matrices-in-numpy , 2014.Accessed: 2021-02-08.[21] Judea Pearl. Reverend Bayes on inference engines: A distributed hierarchical approach. In Conference on Artificial Intelligence (AAAI) , 1982.[22] Stuart Russell and Peter Norvig. Artificial Intelligence A Modern Approach . PearsonEducation/Prentice-Hall, third edition, 2010.[23] scipy. scipy.special.logsumexp. https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.logsumexp.html , 2020. Accessed: 2021-02-08.[24] Jessica Semega, Melissa Kollar, John Creamer, and Abinash Mohanty. Income and povertyin the united states: 2018. , 2020. Accessed: 2021-01-02.[25] SmartNoise. The exponential mechanism for medians. https://github.com/opendifferentialprivacy/smartnoise-core/blob/develop/whitepapers/mechanisms/exponential_median/ExponentialMechForMedian.pdf , 2020. Accessed: 2021-01-05.[26] Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Symposium on the Theory of Computing (STOC) , 2011.[27] Soumik. Goodreads-books dataset. , 2019. Accessed: 2020-12-27.[28] Christos Tzamos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, and Ilias Zadik. Optimal pri-vate median estimation under minimal distributional assumptions. In Neural InformationProcessing Systems (NeurIPS) , 2020.[29] Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations ofCryptography . Springer, 2017.[30] Oliver Williams and Frank McSherry. Probabilistic inference and differential privacy. NeuralInformation Processing Systems (NIPS) , 2010.[31] Shun-Zheng Yu. Hidden semi-markov models. Artificial Intelligence , 2010.15 Full Proofs We start with the add-remove version of the sensitivity analysis for ∆ u Q . We proved the swapversion as Lemma 3 in the main body, and this was the focus of the paper. The (slightly morefavorable) add-remove version appears below for completeness. Lemma 10. In the add-remove model, ∆ u Q = 2[1 − min j ∈ [ m +1] ( q j − q j − )] . Proof. Consider neighboring databases X (cid:48) = X ∪ { x (cid:48) } where | X (cid:48) | = n (cid:48) = n + 1 and | X | = n . Let n (cid:48) ( · , · ) denote an interval count using X (cid:48) , and let n ( · , · ) denote an interval count using X . All datapoints are clipped to [ a, b ], o = a , and o m +1 = b + 1, so there exists some [ o j ∗ − , o j ∗ ) containing x (cid:48) . o is nondecreasing and these intervals are half-open, so these intervals do not intersect. Thus,there is exactly one [ o j ∗ − , o j ∗ ) containing x (cid:48) . Then for j (cid:54) = j ∗ , n (cid:48) ( o j − , o j ) = n ( o j − , o j ) and( q j − q j − ) n (cid:48) − ( q j − q j − ) n = q j − q j − . Thus u Q ( X, o ) = −| n ( o j ∗ − , o j ∗ ) − n j ∗ | − (cid:88) j (cid:54) = j ∗ | n ( o j − , o j ) − n j | and u Q ( X (cid:48) , o ) = −| n ( o j ∗ − , o j ∗ ) + 1 − ( q ∗ j − q j ∗ − )( n + 1) | − (cid:88) j (cid:54) = j ∗ | n ( o j − , o j ) − ( q j − q j − )( n + 1) | . The distance between u Q ( X, o ) and u Q ( X (cid:48) , o ) contributed by the first term is || n ( o j ∗ − , o j ∗ ) − ( q j ∗ − q j ∗ − ) n | − | n ( o j ∗ − , o j ∗ ) + 1 − ( q j ∗ − q j ∗ − )( n + 1) || = 1 − ( q j ∗ − q j ∗ − ) , and the distance contributed by the second term is (cid:88) j (cid:54) = j ∗ || n ( o j − , o j ) − ( q j − q j − ) n | − | n ( o j − , o j ) − ( q j − q j − )( n + 1) || ≤ (cid:88) j (cid:54) = j ∗ ( q j − q j − ) . Thus | u Q ( X, o ) − u Q ( X (cid:48) , o ) | ≤ − ( q j ∗ − q j ∗ − ) + (cid:88) j (cid:54) = j ∗ ( q j − q j − )= 2[1 − ( q j ∗ − q j ∗ − )] . The last equality follows from the fact that the sum over all quantile gaps is 1, so the sum overall but the q j ∗ − q j ∗ − gap is 1 − ( q j ∗ − q j ∗ − ). The quantity 2[1 − ( q j ∗ − q j ∗ − )] is maximized byminimizing ( q j ∗ − q j ∗ − ), which gives the final sensitivity bound.Next, we verify that the finite sampling improvement from Section 3.2 still samples from thecorrect distribution. Lemma 5. M Q (cid:48) has the same output distribution as M Q .Proof. Recall that the output space for M Q was O (cid:37) = { o = ( o , . . . , o m ) | a ≤ o ≤ · · · ≤ o m ≤ b } .Define function h on [ a, b ] by h ( y ) = |{ x ∈ X | x < y }| . Then n ( u, v ) = h ( v ) − h ( u ), h ( o ) = h ( a ) =0 = i , and h ( o m +1 ) = h ( b + 1) = n = i m +1 . Thus u Q ( X, o ) = − (cid:88) j ∈ [ m +1] | n ( o j − , o j ) − n j | = − (cid:88) j ∈ [ m +1] | h ( o j ) − h ( o j − ) − n j | = u Q (cid:48) ( X, ( h ( o ) , . . . , h ( o m ))) . Z Q = (cid:90) O (cid:37) exp (cid:18) ε u Q · u Q ( X, o ) (cid:19) do = (cid:90) O (cid:37) exp (cid:18) ε u Q · u Q (cid:48) ( X, ( h ( o ) , . . . , h ( o m )) (cid:19) do. (2)Note that each o = ( o , . . . , o m ) ∈ O (cid:37) has o ∈ [ x i , x i +1 ) , . . . , o m ∈ [ x i m , x i m +1 ) for exactly one s = ( i , . . . , i m ) ∈ S (cid:37) . Shorthand this by o ∈ s , and let g O ( s ) = { o ∈ O (cid:37) | o ∈ s } . Then(2) = (cid:88) s ∈ S (cid:37) (cid:90) g O ( s ) exp (cid:18) ε u Q · u Q (cid:48) ( X, ( h ( o ) , . . . , h ( o m )) (cid:19) do = (cid:88) s ∈ S (cid:37) (cid:90) g O ( s ) exp (cid:18) ε u Q · u Q (cid:48) ( X, s ) (cid:19) do = (cid:88) s ∈ S (cid:37) exp (cid:18) ε u Q · u Q (cid:48) ( X, s ) (cid:19) · (cid:90) g O ( s ) do. (3)We focus on the (cid:82) g O ( s ) do term. If h ( o ) , . . . , h ( o m ) are all distinct, i.e. o , . . . , o m come from distinctintervals between data points, then (cid:90) g O ( s ) do = m (cid:89) j =1 ( x i j +1 − x i j ) . The remaining (and more complex) case is when h ( o ) , . . . , h ( o m ) are not distinct. Suppose h ( o ) , . . . , h ( o k ) are not distinct but the remaining h ( o k +1 ) , . . . , h ( o m ) are distinct and differentfrom h ( o ). Note that the non-distinct elements are consecutive since o ≤ · · · ≤ o m . Thenthere is some i ∈ I such that o , . . . , o k ∈ [ x i , x i +1 ). Thus the set of valid o , . . . , o k is exactly { ( o , . . . , o k ) | x i ≤ o ≤ · · · ≤ o k < x i +1 } .We need to determine the volume of this set. First, note that the collection consisting of all sets of k values from interval i has volume ( x i +1 − x i ) k . Then, note that the probability that k values selected at random from an interval will be perfectly sorted is 1 /k !; this is the volume of thestandard k -simplex, which is the set { ( x , . . . , x k ) | ≤ x ≤ · · · ≤ x k ≤ } . Hence, for the set thatwe are interested in, { ( o , . . . , o k ) | x i ≤ o ≤ · · · ≤ o k < x i +1 } , the volume is ( x i +1 − x i ) k k ! .More generally, this leads us to define the scaling factor γ in Section 3.2: γ ( s ) = (cid:89) i ∈ I count s ( i )!where count s ( i ) is the number of times i appears in s , and we take 0! = 1. γ thus repeats theabove scaling process for each interval according to its number of repetitions in h ( o ) , . . . , h ( o m ).It follows that for any s ∈ S (cid:37) , (cid:90) g O ( s ) do = (cid:81) mj =1 ( x i j +1 − x i j ) γ ( s ) . Returning to our original chain of equalities, we get(3) = (cid:88) s ∈ S (cid:37) exp (cid:18) ε u Q · u Q (cid:48) ( X, s ) (cid:19) · (cid:81) mj =1 ( x i j +1 − x i j ) γ ( s )= Z Q (cid:48) . f Q for M Q , by above f Q ( o ) = 1 Z Q (cid:48) · exp (cid:18) ε u Q · u Q ( X, o ) (cid:19) . For any s ∈ S (cid:37) and any o, o (cid:48) ∈ g O ( s ) we have f Q ( o ) = f Q ( o (cid:48) ) and P M Q [ M Q ( X ) ∈ g O ( s )] = 1 Z Q (cid:48) · (cid:90) g O ( s ) exp (cid:18) ε u Q · u Q (cid:48) ( X, ( h ( o ) , . . . , h ( o m ))) (cid:19) do = 1 Z Q (cid:48) · exp (cid:18) ε u Q · u Q (cid:48) ( X, s ) (cid:19) · (cid:81) mj =1 ( x i j +1 − x i j ) γ ( s )= P M Q (cid:48) (cid:2) M Q (cid:48) ( X ) ∈ g O ( s ) (cid:3) .u Q is constant over o ∈ g O ( s ) for any s , so conditioned on selecting a given s = ( i , . . . , i m ) ∈ S (cid:37) , M Q has a uniform output distribution over increasing sequences from g O ( s ), i.e. f Q ( o ) = f Q ( o | o ∈ g O ( s )) · P M Q [ M Q ( X ) ∈ g O ( s )]= (cid:89) j ∈ [ m ] γ ( s ) x i j +1 − x i j · P M Q [ M Q ( X ) ∈ g O ( s )]= (cid:89) j ∈ [ m ] γ ( s ) x i j +1 − x i j · P M Q (cid:48) (cid:2) M Q (cid:48) ( X ) ∈ g O ( s ) (cid:3) = f Q (cid:48) ( o )where the second and fourth equalities use (cid:82) g O ( s ) do = (cid:81) mj =1 ( x ij +1 − x ij ) γ ( s ) . Thus M Q and M Q (cid:48) haveidentical output distributions.We now repeat this process for the efficient sampling improvement from Section 3.3. Theorem 1. JointExp satisfies ε -differential privacy, takes time O ( mn + m n ) , and uses space O ( mn + m n ) .Proof. We first verify that JointExp samples from M Q (cid:48) , which implies differential privacy. Sincethe uniform sampling step is unchanged, it suffices to show that the distribution over sampledsequences of intervals is correct.Let S (cid:37) ( j, i, k ) = { ( i , . . . , i j ) | i ≤ · · · ≤ i j − k < i j − k +1 = · · · = i j = i } denote the set ofnondecreasing sequences of length j where exactly the last k intervals are equal to i . We will firstshow that, for all j ∈ [ m ], all i ∈ I , and all k ∈ [ j ], α ( j, i, k ) = (cid:88) s =( i ,...,i j ) s ∈ S (cid:37) ( j,i,k ) γ ( s ) (cid:89) j (cid:48) ∈ [ j ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) . (4)The base case of j = 1 holds by definition, since we let α (1 , i, 1) = φ (0 , i, 1) and γ ( s ) = 1 when s 18s a sequence of length one. Inductively, assuming equality holds for j (cid:48) < j , we have α ( j, i, 1) = (cid:88) i (cid:48)
1) consists of a sequence in S (cid:37) ( j − , i (cid:48) , k ) for some i (cid:48) < i and k < j with an i appended to the end. Note that the appending of only a single i means that γ ( s ) = γ ( s (cid:48) ).Similarly, α ( j, i, k > 1) = φ ( i, i, j ) · α ( j − , i, k − /k = 1 k φ ( i, i, j ) (cid:88) s (cid:48) =( i ,...,i j − ) s (cid:48) ∈ S (cid:37) ( j − ,i,k − γ ( s (cid:48) ) (cid:89) j (cid:48) ∈ [ j − φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) )= (cid:88) s =( i ,...,i j ) s ∈ S (cid:37) ( j,i,k ) γ ( s ) (cid:89) j (cid:48) ∈ [ j ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) , since every sequence in S (cid:37) ( j, i, k > 1) consists of a sequence in S (cid:37) ( j − , i, k − 1) with an i appendedto the end and, when i appears k − s (cid:48) and s is equal to s (cid:48) with an i appendedto the end, γ ( s ) = kγ ( s (cid:48) ). Thus Eq. 4 holds, and we have the “forward” step: α ( j, i, k ) is the(unnormalized) mass of nondecreasing length- j sequences ending in k repetitions of i .Now consider the backward sampling process in JointExp . In the first step, we sample a pair( i, k ) ∝ α ( m, i, k ) φ ( i, n, m + 1)= (cid:88) s =( i ,...,i m ) s ∈ S (cid:37) ( m,i,k ) γ ( s ) (cid:89) j (cid:48) ∈ [ m ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) φ ( i, n, m + 1)= (cid:88) s =( i ,...,i m ) s ∈ S (cid:37) ( m,i,k ) γ ( s ) (cid:89) j (cid:48) ∈ [ m +1] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) ∝ (cid:88) s ∈ S (cid:37) ( m,i,k ) P M Q (cid:48) [ s ] , where the second equality uses the fact that we fix i m +1 = n . Since { S (cid:37) ( m, i, k ) } i,k is a partitionof S (cid:37) (that is, every sequence in S (cid:37) appears in S (cid:37) ( m, i, k ) for exactly one value of the pair ( i, k )),we conclude that ( i, k ) is sampled according to the marginal probability that a sequence sampledfrom P M Q (cid:48) ends in exactly k copies of i . 19ontinuing the backward recursion, if the values of s suffix = ( i j +1 , . . . , i m ) have already beensampled, then at the next step we sample a pair( i < i j +1 , k ) ∝ α ( j, i, k ) φ ( i, i j +1 , j + 1)= (cid:88) s prefix =( i ,...,i j ) s prefix ∈ S (cid:37) ( j,i,k ) γ ( s prefix ) (cid:89) j (cid:48) ∈ [ j ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) φ ( i, i j +1 , j + 1) ∝ (cid:88) s prefix =( i ,...,i j ) s prefix ∈ S (cid:37) ( j,i,k ) γ ( s prefix ) (cid:89) j (cid:48) ∈ [ j ] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) φ ( i, i j +1 , j + 1) γ ( s suffix ) m +1 (cid:89) j (cid:48) = j +2 φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) = (cid:88) s prefix =( i ,...,i j ) s prefix ∈ S (cid:37) ( j,i,k ) γ ( s prefix ) γ ( s suffix ) (cid:89) j (cid:48) ∈ [ m +1] φ ( i j (cid:48) − , i j (cid:48) , j (cid:48) ) ∝ (cid:88) s prefix =( i ,...,i j ) s prefix ∈ S (cid:37) ( j,i,k ) P M Q (cid:48) [ s prefix + s suffix ] , where + denotes sequence concatenation, and we use the fact that, since s prefix and s suffix arenondecreasing and i j < i j +1 , γ ( s prefix + s suffix ) = γ ( s prefix ) γ ( s suffix ). Again, the set of nondecreasingsequences of length j can be partitioned into disjoint subsets { S (cid:37) ( j, i, k ) } i,k , thus the pair ( i, k ) issampled according to the marginal probability that a sequence sampled from P M Q (cid:48) , conditional onhaving the suffix s suffix , has a j -length prefix ending in exactly k copies of i .Inductively, then, JointExp samples a sequence s according to P M Q (cid:48) . It remains to show thatAlgorithm 1 uses O ( mn + m n ) space and time. Space analysis. φ occupies O ( mn ) space, ˆ α occupies O ( mn ) space, and α occupies O ( m n )space. All other variables in the algorithm occupy a constant amount of space. Hence, the overallspace usage is O ( mn + m n ). Time analysis. The first two for-loops in Algorithm 1 compute φ . Each entry of φ can becomputed in constant time, so these loops run in O ( mn ) time. The next loop computes α (1 , i, i ∈ I . This just copies entries of φ into α , so it runs in O ( n ) time. The next computation isthe nested for-loop for ˆ α . The equation for ˆ α simply sums over at most O ( m ) previously-computed α entries, so computing all O ( mn ) entries of ˆ α takes O ( m n ) time. The subsequent nested for-loopcomputes α ( j, i, α ( j, i, 1) simply sums over at most O ( n ) previously-computed φ and ˆ α terms, so computing it for all ( i, j ) pairs takes time O ( mn ). Turning to the nestedfor-loop for α ( i, j, k > α terms can be computed in constanttime, since it is just a product of one previously-computed α term with one previously-computed φ term. Hence, these remaining α take O ( m n ) time to compute. Finally, the “Sample ( i, k )” steptakes O ( mn ) time, and is repeated at most m times in the for-loop. Thus, the overall runtime is O ( mn + m n ). 20 Logarithm Trick In this section, we give details for a more numerically stable logarithmic version of JointExp .First, we compute and store ln( φ ) instead of φ . Next, recall that we defined α (1 , i, 1) = φ (0 , i, α (1 , i, k ) = 0 for k > 1. The former becomes ln( α (1 , i, φ (0 , i, α (1 , i, k )) = −∞ , e.g. using -numpy.inf in Python.We now turn to ln( α ( j, · , · )) for j = 2 , . . . , m . Then we setln( ˆ α ( j − , i )) = ln (cid:88) k The “racing” method is originally due to Ilya Mironov. To the best of our knowledge, full expositionand proofs first appeared in the work of Medina and Gillenwater [18]. We recap their expositionhere. The main tool is the following result: Lemma 11 (Proposition 5 [18]) . Let U , . . . , U N ∼ U (0 , be uniform random samples from [0 , and define random variable R = arg min [ N ] [ln(ln(1 /U k )) − ln( p k )] . Then P R [ k ] = p k (cid:80) Nj =1 p j . Lemma 11 enables us to sample from distributions that depend on small probabilities p k byinstead using their more tractable logarithms. In combination with the logarithm trick from Ap-pendix B, we can avoid dealing with exponentiated terms entirely.21 Discussion of Other Quantile Algorithms In this section, we discuss the private quantile estimation algorithms of Dwork and Lei [8] andTzamos, Vlatakis-Gkaragkounis, and Zadik [28] and explain why we do not include them in ourexperiments. Both of these are single quantile algorithms, so, as with the other algorithms that weconsider, they would require m compositions in order to estimate m quantiles.Dwork and Lei [8] define a “propose-test-release” algorithm. Briefly, it discretizes the spaceinto bins of equal width, then computes how many points in the dataset must change to move thetrue quantile out of its current bin. If this number is too small (specifically, if it is no larger thanln ( n ) + 2, the “test”), then the algorithm does not produce an answer. Otherwise, the answeris the true quantile plus Laplace noise whose scale is six times the bin width (“release”). We canballpark the accuracy of this method on the uniform data distribution used in our experiments, i.e. n = 1000 samples from U ( − , ( n ) + 2 ≈ 50. If we choose a bin width such that the binwith the true median contains 100 points, then it takes at most 50 swaps to move the median outof that bin. We must therefore choose, at a minimum, a bin size such that the bin containing themedian contains at least 100 points. Even if we successfully make this choice, then the resultingoutput will still be far less accurate than that of all the other methods tested in the experiments.This is because a successfull choice requires a bin width ≥ 1, so the algorithm releases the truemedian value of ≈ − , 5] range with probability ≈ . ≤ 25 for median estimation on uniform data.We now turn to the private quantile estimation algorithm given by Tzamos et al. [28]. Thisalgorithm is also based on adding (a variant of) Laplace noise to the true median. The first drawbackof this method is that its time complexity is O ( n ) (see the footnote accompanying their definitionof “TypicalHamming”). This makes it impractical for datasets with more than a few hundreddatapoints. The second drawback is the need to select several hyperparameters ( R, r, L, C ) todetermine the specific Laplace noise distribution. While this hyperparameter selection does notaffect the privacy guarantee, it does affect the utility. Their utility guarantees assume that thealgorithm operator knows these distributional parameters a priori, but this assumption may behard to satisfy in practice. In contrast, JointExp and the other comparison algorithms only requirethe user to provide endpoints. E Details For Comparison Algorithms E.1 AppIndExp In this section we provide details for AppIndExp . The technique that differentiates AppIndExp from IndExp is the variant of advanced composition for nonadaptive bounded-range mechanisms givenby Dong et al. [7]. For simplicity, we give a less general (but not weaker) version of their result. Lemma 12 (Theorem 3 [7]) . Let mechanism A consist of m ε -DP applications of the exponentialmechanism. Define t ∗ (cid:96) = (cid:20) ε g + ( (cid:96) + 1) εm + 1 (cid:21) ε and p t ∗ (cid:96) = e − t ∗ (cid:96) − e − ε − e − ε here [ x ] ε denotes the value of x clipped to interval [0 , ε ] . Then A is ( ε g , δ ) -DP for δ = max ≤ l ≤ m m (cid:88) i =0 (cid:20)(cid:18) mi (cid:19) p m − it ∗ (cid:96) · (1 − p t ∗ (cid:96) ) i · max (cid:16) e mt ∗ (cid:96) − iε − e ε g , (cid:17)(cid:21) . To apply Lemma 12 with a fixed δ , we use it to compute the largest ε , at a granularity of 0.01,that achieves ( ε g , δ )-DP with some δ ≤ − , and we use this value for our experiments. As thisis independent of the actual mechanism in question, the time required for this computation is notincluded in the runtime values reported for AppIndExp . E.2 Smooth and CSmooth We now move on to the smooth sensitivity algorithms, starting with the precise statement of the t -smooth sensitivity of computing a quantile: Lemma 13 (Proposition 3.4 [19]) . Let a and b be client-provided left and right data endpoints. Let X be a database of values x ≤ . . . ≤ x n in [ a, b ] , and for notational convenience define x i = a for i < and x i = b for i > n . Let x j ∗ be the true value for quantile q on X . Then the t -smoothsensitivity of computing q on X is S tq ( X ) = max m =0 ,...,n (cid:18) e − tm · max k =0 ,...,m +1 ( x j ∗ + k − x j ∗ + k − m − ) (cid:19) . Looking at the two max operations, we can compute S tq ( X ) in time O ( n ). Nissim et al. [19]also provide a slightly more involved method for computing S tq ( X ) in time O ( n log( n )). We omitits details here but note that our implementation uses this O ( n log( n )) speedup for the fairest timecomparison.Next, we specify the exact noise distribution used to generate additive noise in Smooth : Lemma 14 (Lemma 2.6, Lemma 2.9 [19]) . Let f be a real-valued function with t -smooth sensitivity S tf ( X ) on database X . Then sampling Z ∼ Lap (1) from the standard Laplace distribution andreleasing f ( X ) + S ε/ [2 ln(2 /δ )] f ( X ) · Zε/ satisfies ( ε, δ ) -DP. The analogous result for CSmooth is similar: Lemma 15 (Proposition 3 [4]) . Define the Laplace Log-Normal distribution with shape parameter σ > , LLN ( σ ) , as the distribution for the random variable Z = X · e σY where X ∼ Lap (1) and Y ∼ N (0 , . Let f be a real-valued function and let s, t > . Then releasing f ( X ) + S tf ( X ) · Zs satisfies ε -CDP for ε = tσ + e . σ s . Once we fix the desired CDP privacy parameter ε , to apply Lemma 15 we must still select t, s, σ > 0. We follow the selection method given in Sections 3.1.1 and 7.1 of Bun and Steinke [4],omitting most of the details. First, for each of a sequence of values for t , we set s = e − . σ ( ε − t/σ )23nd numerically solve for σ as a root of the polynomial εt σ − σ − t provides a collection of ( t, s, σ ) triples without touching the database X . Given thesetriples ( t, s, σ ), we finally select one to minimize variance S tf ( X ) e − σ ( ε − t/σ ) .We pause to note that this last minimization of variance repeatedly touches X to compute S tf ( X ) for different t and so is not differentially private. We therefore carried out this non-privateselection process once using data drawn from the standard Gaussian N (0 , 1) and used the resultingvalues for CSmooth experiments on our datasets. In practice, after starting from a wide range for t of 150 logarithmically spaced values between 10 − and 10, we found that the values selected for t clustered in a narrow subinterval across both data drawn from N (0 , 1) and data drawn from ourother experiment distributions. We therefore view the selection of t as contributing relatively littleto the final error of CSmooth .In more detail, the actual t selection process in our experiments is to use the variance-minimizingselection process described in Section 5.1 for each quantile in sets of quantiles ranging from m = 1to m = 29 for ε = 0 . 1, and ε = 1. The range for t is 50 logarithmically spaced values between 0.001and 0.1 ( ε = 0 . 1) or 50 logarithmically spaced values between 0.01 and 1 ( ε = 1). Each trial used1000 samples drawn from N (0 , 1) with data lower bound − 100 and data upper bound 100. Below,we record the t selected for each quantile and quantile range, averaged across 5 trials. Each colorrepresents a different set of quantiles, and each point for each color represents the t selected for asingle quantile. We then used these selected quantiles for each (number of quantiles, quantile) pairin our experiments.Figure 5: Tuned t used for CSmooth for ε = 0 . ε = 1 across different quantile ranges. Forexample, we used t ≈ . 13 for ε = 1, m = 3, and q = 0 . ε = 1 plot). 24 Additional ε = 0 . Error Plots Figure 6: ε = 0 . 1. Note thelogarithmic y -axis: for m = 3, JointExp produces 5x fewer errors than any other algorithm andestimates each quantile within on average <<