Computing Differential Privacy Guarantees for Heterogeneous Compositions Using FFT
CComputing Differential Privacy Guarantees forHeterogeneous Compositions Using FFT
Antti Koskela and Antti HonkelaHelsinki Institute for Information Technology HIIT,Department of Computer Science, University of Helsinki, Finland
Abstract
The recently proposed Fast Fourier Transform (FFT)-based accountant for evaluating ( ε, δ ) -differential privacy guarantees using the privacy loss distribution formalism has beenshown to give tighter bounds than commonly used methods such as Rényi accountants whenapplied to compositions of homogeneous mechanisms. This approach is also applicable tocertain discrete mechanisms that cannot be analysed with Rényi accountants. In this paper,we extend this approach to compositions of heterogeneous mechanisms. We carry out a fullerror analysis that allows choosing the parameters of the algorithm such that a desired accu-racy is obtained. Using our analysis, we also give a bound for the computational complexityin terms of the error which is analogous to and slightly tightens the one given by Murtaghand Vadhan (2018). We also show how to speed up the evaluation of tight privacy guaranteesusing the Plancherel theorem at the cost of increased pre-computation and memory usage. Differential privacy (DP) (Dwork et al., 2006) has become the standard approach for privacy-preserving machine learning. When using DP, one challenge is to accurately bound thecompound privacy loss of the increasingly complex DP algorithms. An important exampleis given by the differentially private stochastic gradient descent (DP-SGD). The momentsaccountant (Abadi et al., 2016) gave a major improvement in bounding the the complete ( ε, δ ) -profile of the DP-SGD algorithm, and this analysis has been refined through the gen-eral development of Rényi differential privacy (RDP) (Mironov, 2017) as well as tighter RDPbounds for subsampled mechanisms (Balle et al., 2018; Wang et al., 2019; Zhu and Wang,2019; Mironov et al., 2019). RDP enables tight analysis for compositions of Gaussian mech-anisms, but this may be difficult for other mechanisms. For example, there is no existingRDP analysis for the binomial mechanism, which is used in federated learning (e.g. Agar-wal et al., 2018). Moreover, conversion of RDP guarantees back to more commonly used ( ε, δ ) -guarantees is lossy.In this work we use the privacy loss distribution (PLD) formalism introduced by Sommeret al. (2019) to numerically evaluate tight privacy bounds for compositions. This work a r X i v : . [ c s . CR ] F e b xtends the recent Fast Fourier Transform (FFT) accountant by Koskela et al. (2020a,b)to heterogeneous compositions of discrete mechanisms. We also illustrate how to applythis accountant to continuous mechanisms and discrete-continuous hybrids. Experimentalcomparisons to the Tensorflow moments accountant show that the FFT-based method allowsapproximately 1.5 times as many compositions for equal privacy guarantees.We provide a rigorous error analysis for the proposed method in terms of the trunca-tion and discretisation parameters L and n . This analysis not only leads to strict upper ( ε, δ ) -bounds, but also gives means for automatically tuning these parameters before thecomputation to meet a specific accuracy target.The need to consider discrete mechanisms for rigorous DP was first pointed out by Mironov(2012). Agarwal et al. (2018) implement a communication-efficient binomial mechanismcpSGD for which the DP guarantees of compositions however cannot be tightly bounded.Recently, the discrete Gaussian mechanism has also been proposed as a simple discrete coun-terpart for the Gaussian mechanism (Canonne et al., 2020). Moreover, Agarwal et al. (2018)and Kairouz et al. (2019) note the need for a privacy accountant for the binomial mechanismas an important open problem. Discrete mechanisms are commonly used in private federatedlearning (e.g. Girgis et al., 2020) and these would benefit significantly from more accurateaccounting enabled by our work. Our Contribution.
The main contributions of this work are:• We extend the recently proposed FFT-based privacy accountant (Koskela et al., 2020a,b)for computing tight privacy bounds for heterogeneous compositions.• We give a full error analysis for the method in terms of the pre-defined parameters ofthe algorithm. This also leads to strict upper δ ( ε ) -bounds. We show that these boundsare accurate in a sense that they allow choosing appropriate parameter values such thata desired accuracy is obtained.• Using the error analysis, we also bound the computational complexity of the algorithmin terms of number of compositions k and the tolerated error η . Our bound is slightlybetter than the existing bound given by Murtagh and Vadhan (2018).• We show how to speed up the evaluation of the privacy parameters for varying numbersof compositions using the Plancherel theorem. We first recall some basic definitions of DP (Dwork et al., 2006). We use the followingnotation. An input data set containing N data points is denoted as X = ( x , . . . , x N ) ∈ X N ,where x i ∈ X , ≤ i ≤ N . Definition 1.
We say two data sets X and Y are neighbours in remove/add relation if weget one by removing/adding an element from/to the other and denote this with ∼ R . We say X and Y are neighbours in substitute relation if we get one by substituting one element inthe other. We denote this with ∼ S . efinition 2. Let ε > and δ ∈ [0 , . Let ∼ define a neighbouring relation. Mechanism M : X N → R is ( ε, δ, ∼ ) -DP if for every X ∼ Y and every measurable set E ⊂ R we have Pr( M ( X ) ∈ E ) ≤ e ε Pr( M ( Y ) ∈ E ) + δ. When the relation is clear from context or irrelevant, we will abbreviate it as ( ε, δ ) -DP. Wecall M tightly ( ε, δ, ∼ ) -DP, if there does not exist δ (cid:48) < δ such that M is ( ε, δ (cid:48) , ∼ ) -DP. We first introduce the basic tool for obtaining tight privacy bounds: the privacy loss distri-bution (PLD). The results in Subsection 3.1 are reformulations of the results given by Meiserand Mohammadi (2018) and Sommer et al. (2019).
We consider discrete-valued one-dimensional mechanisms M which can be seen as mappingsfrom X N to the set of discrete-valued random variables. The generalised probability densityfunctions of M ( X ) and M ( Y ) , denoted f X ( t ) and f Y ( t ) , respectively, are given by f X ( t ) = (cid:88) i a X,i · δ t X,i ( t ) ,f Y ( t ) = (cid:88) i a Y,i · δ t Y,i ( t ) , (3.1)where δ t ( · ) , t ∈ R , denotes the Dirac delta function centred at t , and t X,i , t
Y,i ∈ R and a X,i , a
Y,i ≥ . We refer to Koskela et al. (2020b) for more details of the notation. Theprivacy loss distribution is defined as follows. Definition 3.
Let M : X N → R , R ⊂ R , be a discrete-valued randomised mechanismand let f X ( t ) and f Y ( t ) be probability density functions of the form (3.1) . We define thegeneralised privacy loss distribution (PLD) ω X/Y as ω X/Y ( s ) = (cid:88) t X,i = t Y,j a X,i · δ s i ( s ) , (3.2) where s i = log (cid:16) a X,i a Y,j (cid:17) . ( ε, δ ) -Bounds for Compositions Let the generalised probability density functions f X and f Y of the form (3.1). We define theconvolution f X ∗ f Y as ( f X ∗ f Y )( t ) = (cid:88) i,j a X,i a Y,j · δ t X,i + t Y,j ( t ) . We consider non-adaptive compositions of the form M ( X ) = (cid:0) M ( X ) , . . . , M k ( X ) (cid:1) nd we denote by f X,i ( t ) the density function of M i ( X ) for each i , and by f Y,i ( t ) that of M i ( Y ) . For each i , ≤ i ≤ k , we denote the PLD as defined by Def. 3 and densities f X,i ( t ) and f Y,i ( t ) by ω X/Y,i .The following theorem shows that the tight ( ε, δ ) -bounds for compositions of non-adaptiveheterogeneous mechanisms are obtained using convolutions of PLDs (see also Thm. 1 by Som-mer et al. (2019)). A proof is given in the Appendix. Theorem 4.
Consider a non-adaptive composition of k independent mechanisms M , . . . , M k and neighbouring data sets X and Y . The composition is tightly ( ε, δ ) -DP for δ ( ε ) given by δ ( ε ) = max { δ X/Y ( ε ) , δ Y/X ( ε ) } , where δ X/Y ( ε ) = 1 − k (cid:89) (cid:96) =1 (1 − δ X/Y,(cid:96) ( ∞ )) + (cid:90) ∞ ε (1 − e ε − s ) (cid:0) ω X/Y, ∗ · · · ∗ ω X/Y,k (cid:1) ( s ) d s,δ X/Y,(cid:96) ( ∞ ) = (cid:88) { t i : P ( M (cid:96) ( X )= t i ) > , P ( M (cid:96) ( Y )= t i )=0 } P ( M (cid:96) ( X ) = t i ) (3.3) and ω X/Y, ∗ · · · ∗ ω X/Y,k denotes the convolution of the density functions ω X/Y,(cid:96) , ≤ (cid:96) ≤ k . δ Y/X ( ε ) can be analogously obtained using the PLDs ω Y/X, , . . . , ω Y/X,k . We remark that finding the outputs M i ( X ) and M i ( Y ) , ≤ i ≤ k , that give the maximal δ ( ε ) is application-specific and has to be carried out individually for each case, similarly as,e.g., in the case of RDP (Mironov, 2017). In the experiments of Section 6 it will be clearhow to determine the worst-case distributions f X,i and f Y,i . We next describe the numerical method for computing tight DP guarantees for heterogeneouscompositions of discrete-valued mechanisms. The method is closely related to the homoge-nous case described in (Koskela et al., 2020b). However, the error analysis is tailored to theheterogeneous case and we consider here also the error induced by the grid approximation.
Let x = (cid:2) x , . . . , x n − (cid:3) T , w = (cid:2) w , . . . , w n − (cid:3) T ∈ R n . The discrete Fourier transform F and its inverse F − are defined as (Stoer and Bulirsch,2013) ( F x ) k = (cid:88) n − j =0 x j e − i 2 πkj/n , ( F − w ) k = 1 n (cid:88) n − j =0 w j e i 2 πkj/n , (4.1) here i = √− . Using the Fast Fourier Transform (FFT) (Cooley and Tukey, 1965) re-duces the running time complexity to O ( n log n ) . Also, FFT enables evaluating discreteconvolutions efficiently. The convolution theorem (Stockham Jr, 1966) states that (cid:88) n − i =0 v i w k − i = F − ( F v (cid:12) F w ) , where (cid:12) denotes the element-wise product and the summation indices are modulo n . Similarly as in (Koskela et al., 2020b), we place the PLD on a grid X n = { x , . . . , x n − } , n ∈ Z + , (4.2)where x i = − L + i ∆ x, ∆ x = 2 L/n.
Suppose the distribution ω of the PLD is of the form ω ( s ) = (cid:88) i a i · δ s i ( s ) , (4.3)where a i ≥ and − L ≤ s i ≤ L − ∆ x for all i . We define the grid approximations ω L ( s ) := (cid:88) i a i · δ s L i ( s ) ,ω R ( s ) := (cid:88) i a i · δ s R i ( s ) , (4.4)where s L i = max { x ∈ X n : x ≤ s i } ,s R i = min { x ∈ X n : x ≥ s i } . We note that s i ’s correspond to the logarithmic ratios of probabilities of individual events.Thus, often a moderate L is sufficient for the condition − L ≤ s i ≤ L − ∆ x to hold for all i . We also provide analysis for the case where this assumption does not hold inthe Appendix. This is the case, for example, for the discrete Gaussian distribution (Canonneet al., 2020). From (4.4) we directly get the following result: Lemma 5.
Let δ ( ε ) be given by the integral formula of Theorem 4 for PLDs ω , · · · , ω k of the form (4.3) . Let δ L ( ε ) and δ R ( ε ) correspondingly be determined by the left and rightapproximations ω L1 , . . . , ω L k and ω R1 , . . . , ω R k , as defined in (4.4) . Then for all ε > : δ L ( ε ) ≤ δ ( ε ) ≤ δ R ( ε ) . roof. Recall the integral formula of Theorem 4: δ ( ε ) = 1 − k (cid:89) (cid:96) =1 (1 − δ (cid:96) ( ∞ )) + (cid:90) ∞ ε (1 − e ε − s ) ( ω ∗ · · · ∗ ω k ) ( s ) d s. As the probabilities δ (cid:96) ( ∞ ) are not affected by the grid approximation, we may only considerbounds for the integral (cid:90) ∞ ε (1 − e ε − s ) ( ω ∗ · · · ∗ ω k ) ( s ) d s. By definition of the discrete convolution (eq. (4.8)), ( ω ∗ · · · ∗ ω k )( s ) = (cid:88) n − i ,...,i k =0 a ji · · · a ji k · δ s i + ... + s ik ( s ) (4.5)and ( ω L1 ∗ · · · ∗ ω L k )( s ) = (cid:88) n − i ,...,i k =0 a ji · · · a ji k · δ s L i + ... + s L ik ( s ) . (4.6)Since (1 − e ε − s ) is a monotonously increasing function of s for s ≥ ε , and since s i + . . . + s i k ≥ s L i + . . . + s L i k for all ( i , . . . , i k ) , we instantly see from (4.5) and (4.6) that δ L ( ε ) ≤ δ ( ε ) . For the right grid approximation, s i + . . . + s i k ≤ s R i + . . . + s R i k for all ( i , . . . , i k ) , and wesimilarly see that δ R ( ε ) ≥ δ ( ε ) . By truncating convolutions and periodising the PLD distributions we arrive at periodic sumsto which the FFT is directly applicable. These operations are analogous to the homogeneouscase described in (Koskela et al., 2020b). We describe them next shortly.Suppose ω and ω are defined such that ω ( s ) = (cid:88) i a i · δ s i ( s ) , ω ( s ) = (cid:88) i b i · δ s i ( s ) , (4.7)where for all i : a i , b i ≥ and s i = i ∆ x . The convolution ω ∗ ω can then be written as ( ω ∗ ω )( s ) = (cid:88) i,j a i b j · δ s i + s j ( s )= (cid:88) i (cid:16) (cid:88) j a j b i − j (cid:17) · δ s i ( s ) . (4.8)Let L > . We truncate convolutions to the interval [ − L, L ] : ( ω ∗ ω )( s ) ≈ (cid:88) i (cid:16) (cid:88) − L ≤ s j Let ω and ω be of the form (4.7) , such that s i = − L + i ∆ x , ≤ i ≤ n − ,where L > , n is even and ∆ x = 2 L/n . Define a = (cid:2) a . . . a n − (cid:3) T , b = (cid:2) b . . . b n − (cid:3) T and D = (cid:104) I n/ I n/ (cid:105) ∈ R n × n . Then, ( (cid:101) ω (cid:126) (cid:101) ω )( s ) = (cid:88) n − i =0 c i · δ s i ( s ) , where c i = (cid:2) D F − (cid:0) F ( D a ) (cid:12) F ( D b ) (cid:1)(cid:3) i , and (cid:12) denotes the element-wise product of vectors. Since the coefficients of (cid:101) ω (cid:126) (cid:101) ω are exactly given by the discrete Fourier transform, weare able to analyse the error induced by the FFT approximation by only considering theerror of the approximation (4.9). δ ( ε ) Given a discrete-valued PLD distribution ω , we get a strict upper δ ( ε ) -DP bound as follows.Using parameter values L > and n ∈ Z + , we form a grid X n as defined in (4.2) and placeeach PLDs ω i , ≤ i ≤ k , on X n to obtain ω R i as defined in (4.4). We then approximate δ R ( ε ) using Algorithm 1. We estimate the error incurred by truncation of convolutionsperiodisation of PLDs using Thm. 7 (or Thm. 8 in case the support of ω R i is no included inthe interval [ − L, L ] ). By adding this error to the approximation given by Algorithm 1 weobtain a strict upper bound for δ ( ε ) .The error for the truncation of convolutions periodisation of PLDs is given only in termsof the parameter L . The parameter n can then be increased in case the discretisation errorbound given by Thm. 9 is too large. lgorithm 1 Fourier Accountant Algorithm for Heterogeneous Discrete-Valued MechanismsInput: distributions ω , . . . , ω m of the form ω j ( s ) = (cid:88) i a ji · δ s i ( s ) , ≤ j ≤ m , such that s i = − L + i ∆ x , where n is even and, ≤ i ≤ n − , ∆ x = 2 L/n .Numbers of compositions for each mechanism, k , . . . , k m .Set a j = (cid:104) a j . . . a jn − (cid:105) T , ≤ j ≤ m. Evaluate the convolutions using Lemma 6 and FFT.For each j , ≤ j ≤ m , evaluate the FFT: (cid:101) a j = F ( D a j ) . Compute the element-wise products and apply F − : b = (cid:104) D F − (cid:0) ( (cid:101) a ) (cid:12) k (cid:12) · · · (cid:12) ( (cid:101) a m ) (cid:12) k m (cid:1)(cid:105) . Approximate δ ( ε ) : δ ( ε ) ≈ − m (cid:89) (cid:96) =1 (1 − δ X/Y,(cid:96) ( ∞ )) k (cid:96) + (cid:88) { (cid:96) : − L + (cid:96) ∆ x>ε } (cid:0) − e ε − ( − L + (cid:96) ∆ x ) (cid:1) b (cid:96) , where δ X/Y,(cid:96) ( ∞ ) is defined in Theorem 4. We next bound the error induced by the grid approximation and Algorithm 1. The totalerror consists of (see the Appendix for more details)1. The tail integral (cid:82) ∞ L (1 − e ε − s )( ω ∗ · · · ∗ ω k )( s ) d s .2. The error arising from periodisation of ω and truncation of the convolutions (affectedby the parameter L ): (cid:90) Lε (1 − e ε − s )( ω ∗ · · · ∗ ω k )( s ) d s − (cid:90) Lε (1 − e ε − s )( (cid:101) ω (cid:126) · · · (cid:126) (cid:101) ω k )( s ) d s. 3. The discretisation error arising from the grid approximations (affected by both param- ters L and n ): (cid:90) Lε (1 − e ε − s )( ω ∗ · · · ∗ ω k )( s ) d s − (cid:90) Lε (1 − e ε − s )( ω R1 ∗ · · · ∗ ω R k )( s ) d s. We obtain error bounds essentially using the Chernoff bound (Wainwright, 2019) P [ X ≥ t ] ≤ E [e λX ]e λt which holds for any random variable X and all λ > . Suppose ω X/Y is of the form ω X/Y ( s ) = (cid:88) i a X,i · δ s i ( s ) , (5.1)where s i = log (cid:16) a X,i a Y,i (cid:17) and a X,i , a Y,i > . Then, the moment generating function of ω X/Y isgiven by E [e λω X/Y ] = (cid:88) i e λs i · a X,i = (cid:88) i (cid:18) a X,i a Y,i (cid:19) λ a X,i . (5.2)In our analysis, we repeatedly use the Chernoff bound to bound tails of PLD distributionsin terms of pre-computable moment-generating functions. Denote S k := (cid:80) ki =1 ω i , where ω i denotes the PLD random variable of the i th mechanism. If ω i ’s are independent, we havethat E [e λS k ] = (cid:89) ki =1 E [e λω i ] . Then, the Chernoff bound shows that for any λ > (cid:90) ∞ L ( ω ∗ · · · ∗ ω k )( s ) d s = P [ S k ≥ L ] ≤ (cid:89) ki =1 E [e λω i ] e − λL ≤ e (cid:80) ki =1 α i ( λ ) e − λL , (5.3)where α i ( λ ) = log( E [e λω i ]) . Denote the logarithms of the moment generating functions of the PLDs as α + i ( λ ) = log (cid:16) E [e λω i ] (cid:17) , α − i ( λ ) = log (cid:16) E [e − λω i ] (cid:17) , where ≤ i ≤ k . Futhermore, denote α + ( λ ) = (cid:88) i α + i ( λ ) , α − ( λ ) = (cid:88) i α − i ( λ ) . (5.4)To obtain α + ( λ ) and α − ( λ ) , we evaluate the moment generating functions using the finitesum (F.4).Using the analysis given in the Appendix, we bound the errors arising from the periodisa-tion of the distribution and truncation of the convolutions. As a result, when combining withthe Chernoff bound (5.3), we obtain the following two bounds for the total error incurred byAlgorithm 1. heorem 7. Let ω i ’s be defined on the grid X n as described above (i.e., s j ∈ [ − L, L − ∆ x ] for all j ). Let δ ( ε ) give the tight ( ε, δ ) -bound for the PLDs ω , . . . , ω k and let (cid:101) δ ( ε ) be theresult of Algorithm 1. Then, for all λ > (cid:12)(cid:12)(cid:12) δ ( ε ) − (cid:101) δ ( ε ) (cid:12)(cid:12)(cid:12) ≤ (cid:0) e α + ( λ ) + e α − ( λ ) (cid:1) e − Lλ − e − Lλ . As s i ’s correspond to the logarithmic ratios of probabilities of individual events, often amoderate L is sufficient for − L ≤ s i ≤ L − ∆ x to hold for all i . In the Appendix, we provethe following bound which holds also in case s i ’s are not inside the interval [ − L, L ) . Thishappens for example in case of the discrete Gaussian mechanism (Canonne et al., 2020). Theorem 8. Let the PLDs ω (cid:96) , ≤ (cid:96) ≤ k , take values at the equidistant points s i = i ∆ x .Let δ ( ε ) give the tight ( ε, δ ) -bound for the PLDs ω , . . . , ω k and let (cid:101) δ ( ε ) be the result ofAlgorithm 1. Then, for all λ > (cid:12)(cid:12)(cid:12) δ ( ε ) − (cid:101) δ ( ε ) (cid:12)(cid:12)(cid:12) ≤ (cid:18) e ( k +1) max i α + i ( λ ) − e max i α + i ( λ ) e max i α + i ( λ ) − 1+ e ( k +1) max i α − i ( λ ) − e max i α − i ( λ ) e max i α − i ( λ ) − (cid:19) e − Lλ − e − Lλ . (5.5) Let ω , . . . , ω k be PLD distributions of the form (4.3), i.e., s i ’s are not necessarily on theequidistant grid. Denote ω (cid:96) ( s ) = (cid:88) i a (cid:96)i · δ s (cid:96)i ( s ) . Denote the right grid approximations (as defined in (4.4)) ω R (cid:96) ( s ) = (cid:88) i a (cid:96)i · δ s R ,(cid:96)i ( s ) and the tight ( ε, δ ) -bound corresponding to the PLD ω R1 ∗ · · · ∗ ω R k by δ R ( ε ) . We have thefollowing bound for the error arising from the grid approximation. Theorem 9. Let δ ( ε ) denote the tight ( ε, δ ) -bound for the convolution PLD ω ∗ · · · ∗ ω k .The discretisation error δ R ( ε ) − δ ( ε ) can be bounded as δ R ( ε ) − δ ( ε ) ≤ k ∆ x (cid:0) P ( ω + · · · + ω k ≥ ε ) − δ ( ε ) (cid:1) . (5.6) Proof. See the Appendix. Remark Theorem 9 instantly gives the bound δ R ( ε ) − δ ( ε ) ≤ k ∆ x (cid:0) − δ ( ε ) (cid:1) ≤ k ∆ x. (5.7) On the other hand, the bound (5.6) and the Chernoff bound (5.3) give δ R ( ε ) − δ ( ε ) ≤ k ∆ x P ( ω + · · · + ω k ≥ ε ) ≤ k ∆ x e (cid:80) i α i ( λ ) e − λε (5.8) hich holds for any λ > . By choosing λ appropriately, this leads to a considerably tightera priori bound than (5.7) . Furthermore, after obtaining δ R ( ε ) using Algorithm 1, the bound (5.6) can be used as a posteriori bound as it is equivalent to the bound δ R ( ε ) − δ ( ε ) ≤ k ∆ x (cid:0) P ( ω + · · · + ω k ≥ ε ) − δ R ( ε ) (cid:1) − k ∆ x . Experimental Illustration. Tables 1 to 3 illustrate the periodisation error bound(5.5) and the discretisation error bound (5.8). We consider the one-dimensional binomialmechanism (Agarwal et al., 2018), where a binomially distributed noise Z with parameters n ∈ N and < p < is added to the output of a query f . Denoting the sensitivity of f by ∆ , tight ( ε, δ ) -bounds are obtained by considering the PLD ω X/Y given by the distributions f X and f Y , where f X ∼ ∆ + Bin( N, p ) and f Y ∼ Bin( N, p ) . We set N = 1000 , p = 0 . , ∆ = 1 and L = 5 . . We compute logarithmic probabilities usingthe digamma function and use those to evaluate the values of α + ( λ ) and α + ( λ ) required bythe error bounds. The error bound (5.5) is evaluated using λ = L and for the upper bound(5.8) we take the minimum of the bounds computed with λ ∈ { . L, . L, . L, . L, . L } . n error bound (5.8) δ ( ε )10 . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − Table 1: The discretisation error bound (5.8) for different values of n when ε = 1 . and k = 20 and the corresponding δ ( ε ) -upper bound. The table indicates that the magnitude of the errorbound (5.8) is not far from the magnitude of the actual error. ε error bound (5.8) δ ( ε ) . · − . · − . · − . · − . · − . · − . · − . · − . · − . · − Table 2: Illustration of the discretisation error bound (5.8) for different values of ε when n = 10 and k = 20 and the corresponding tight δ ( ε ) -upper bound. We see that the bound (5.8) stayssmall in relation to δ ( ε ) as δ decreases. 11 .4 Upper Bound for the Computational Complexity The results by Murtagh and Vadhan (2018) state that there is no algorithm for comput-ing tight ( ε, δ ) -bounds that would have polynomial complexity in k . However, Theorem 1.7by Murtagh and Vadhan (2018) states that allowing a small error in the output, the boundscan be evaluated efficiently. More precisely, given a non-adaptive composition of the mech-anisms M , . . . , M k , each mechanism M i being tightly ( ε i , δ i ) -DP, the result states thatthere exists an algorithm that outputs (cid:101) ε ( δ ) such that ε ( δ ) ≤ (cid:101) ε ( δ ) ≤ ε (e − η / · δ ) + η, where ε ( δ ) gives a tight bound for the composition, and the algorithm runs in time O (cid:18) k · ε · (1 + ε ) η log k · ε · (1 + ε ) η (cid:19) , (5.9)where ε = k (cid:80) i ε i .Assuming there are m < k distinct mechanisms in the composition, using the error anal-ysis of Sections 5.2 and 5.3, we obtain a slightly tighter complexity bound for the evaluationof tight δ as a function of ε . Lemma 11. Consider a non-adaptive composition of the mechanisms M , . . . , M k with cor-responding worst-case pairs of distributions f X,i and f Y,i , ≤ i ≤ k . Suppose the sequence M , . . . , M k consists of m distinct mechanisms. Then, it is possible to have an approxima-tion of δ ( ε ) with error less than η with number of operations O (cid:18) m · k · C k η log k · C k η (cid:19) , where C k = max { k (cid:88) i D ∞ ( f X,i || f Y,i ) , k (cid:88) i D ∞ ( f Y,i || f X,i ) } ,D ∞ ( f X || f Y ) = sup a Y,i (cid:54) =0 log a X,i a Y,i and the additional factor in the leading constant is the leading constant of the FFT algorithm.Proof. We first determine a lower bound for the truncation parameter L in terms of k .Consider the right-hand-side of the error bound in Theorem 7. Suppose L ≥ and λ ≥ .Then, we have that (cid:0) e α + ( λ ) + e α − ( λ ) (cid:1) e − Lλ − e − Lλ ≤ (cid:0) e α + ( λ ) + e α − ( λ ) (cid:1) · e2 · e − Lλ , ≤ · e max { α − ( λ ) ,α + ( λ ) } · e2 · e − Lλ , = e max { α − ( λ ) ,α + ( λ ) } +1 e − Lλ , (5.10)where α − ( λ ) and α + ( λ ) are as given in eq. (5.4). or each i , the logarithm of the moment-generating function of the PLD can be expressedin terms of the Rényi divergence (Mironov, 2017): log (cid:16) E [e λω X/Y,i ] (cid:17) = λ · λ (cid:88) i (cid:18) a X,i a Y,i (cid:19) λ a X,i = λ · λ (cid:88) i (cid:18) a X,i a Y,i (cid:19) λ +1 a Y,i = λ · D λ +1 ( f X || f Y ) , where D λ denotes the Rényi divergence of order λ . From the monotonicity of Rényi diver-gence (see Proposition 9, (Mironov, 2017)) it follows that α + ( λ ) = λ · (cid:88) i D λ +1 ( f X,i || f Y,i ) ≤ λ · (cid:88) i D ∞ ( f X,i || f Y,i ) , where D ∞ ( f X || f Y ) = sup a Y,i (cid:54) =0 log a X,i a Y,i . With a similar calculation, we find that α − ( λ ) ≤ ( λ − · (cid:88) i D ∞ ( f Y,i || f X,i ) . Thus, max { α − ( λ ) , α + ( λ ) } ≤ kλ · max { k (cid:88) i D ∞ ( f X,i || f Y,i ) , k (cid:88) i D ∞ ( f Y,i || f X,i ) } . Now we can further bound (5.10) from above as e max { α − ( λ ) ,α + ( λ ) } +1 e − Lλ ≤ e kλ · C k +1 e − Lλ , where C k = max { k (cid:88) i D ∞ ( f X,i || f Y,i ) , k (cid:88) i D ∞ ( f Y,i || f X,i ) } . Requiring this upper bound to be smaller than a prescribed η > , and setting λ = 1 , wearrive at the condition L ≥ k · C k + 1 + log 1 η . (5.11)Next, we bound the computational complexity using a bound for the discretisation error.From Remark 10 it follows that the discretisation error is bounded as δ R ( ε ) − δ ( ε ) ≤ k ∆ x = 2 Lkn . equiring this discretisation error to be less than η , choosing L according to (5.11) andassuming k ≥ log η , we see that choosing n = O (cid:18) k C k η (cid:19) is sufficient for the sum of the error sources to be less than η . As we need to compute FFTfor m different PLDs, and since FFT has complexity n log n , we see that with O (cid:18) mk C k η log k C k η (cid:19) operations it is possible to have an approximation of δ ( ε ) with error less than η , and thatadditional factor in the leading constant is given by the leading constant in the complexityof FFT. Remark We see from the proof, that the periodisation error is less than η for L ≥ log η − + max { α − ( λ ) , α + ( λ ) } + 1 λ . As this is true for all λ ≥ , a minimal sufficient value of L can be found via an optimisationproblem w.r.t. λ . Notice also that since α − ( λ ) and α + ( λ ) correspond to cumulant generatingfunctions (CGFs) (Abadi et al., 2016) of the compositions, and since the minimisation min λ log δ − + α ( λ ) λ corresponds to the conversion of CGF-values to ( ε ( δ ) , δ ) -DP values (Abadi et al., 2016), wesee that approximately (assuming λ − is small) L has to be chosen as L ≥ ε ( η ) , where ε ( η ) gives ( ε, δ ) -DP of the composition ( M , . . . , M k ) at δ = η . Remark For simplicity, we have assumed above that the compositions consist of ( ε, -DP mechanisms and that the parameter L is chose sufficiently large so that for all i : | s i | ≤ L ,where s i = log a X,i a Y,i . Then, we can bound the periodisation error using Theorem 7. In casethis does not hold, finding a priori conditions for the parameters n and L could also be carriedout using Theorem 8. .5 Fast Evaluation Using the Plancherel Theorem When using Algorithm 1 to approximate δ ( ε ) , we need to evaluate an expression of the form b k = D F − (cid:0) F ( D a ) (cid:12) k (cid:12) · · · (cid:12) F ( D a m ) (cid:12) k m (cid:1) (5.12)and the sum (cid:101) δ ( ε ) = (cid:88) − L + (cid:96) ∆ x>ε (cid:0) − e ε − ( − L + (cid:96) ∆ x ) (cid:1) b k(cid:96) . (5.13)When evaluating (cid:101) δ ( ε ) for different numbers of compositions, the inverse transform F − isthe most expensive part since the vectors F ( D a i ) can be precomputed and the vector b k canbe updated in linear time. The following lemma shows that the updates of (cid:101) δ ( ε ) can actuallybe performed in linear time, without using the inverse transform F − . Lemma 14. Let (cid:101) δ ( ε ) be given by (5.13) and let b k be defined as in (5.12) . Denote w ε ∈ R n such that ( w ε ) (cid:96) = max { − e ε − ( − L + (cid:96) ∆ x ) , } . Then, we have that (cid:101) δ ( ε ) = 1 n (cid:104)F ( D w ε ) , F ( D a ) (cid:12) k (cid:12) · · · (cid:12) F ( D a m ) (cid:12) k m (cid:105) . (5.14) Proof. We see that the sum (5.13) is given by the following inner product: (cid:101) δ ( ε ) = (cid:104) w ε , b k (cid:105) . The Plancherel Theorem states that DFT (as defined in (4.1)) preserves inner products: for x, y ∈ R n , (cid:104) x, y (cid:105) = 1 n (cid:104)F x, F y (cid:105) . (5.15)Using (5.15) and the fact that D is symmetric, we see that (cid:101) δ ( ε ) = (cid:104) w ε , b k (cid:105) = (cid:104) w ε , D F − (cid:0) F ( D a ) (cid:12) k (cid:12) · · · (cid:12) F ( D a m ) (cid:12) k m (cid:1) (cid:105) = (cid:104) D w ε , F − (cid:0) F ( D a ) (cid:12) k (cid:12) · · · (cid:12) F ( D a m ) (cid:12) k m (cid:1) (cid:105) = 1 n (cid:104)F ( D w ) , F ( D a ) (cid:12) k (cid:12) · · · (cid:12) F ( D a m ) (cid:12) k m (cid:105) . We instantly see that if the vectors F ( D w ε ) and F ( D a i ) , ≤ i ≤ m , are precomputed, (cid:101) δ ( ε ) can be updated in O ( n ) time. We believe this approach can be used for designing effi-cient online ( ε, δ ) -accountants that also give tight guarantees. Experimental Illustration. Consider computing tight δ ( ε ) -bound for the subsampledGaussian mechanism (see Section 6.2), for q = 0 . and σ = 2 . . We evaluate δ ( ε ) after k = 100 , , . . . , compositions at ε = 1 . . Table 3 illustrates the compute time foreach update of δ ( ε ) , using a) a pre-computed vector F ( D a ) , the inverse transform F − andthe summation (5.13) and b) pre-computed vectors F ( D a ) (cid:12) and F ( D w ε ) and the innerproduct (5.14). t (ms) (5.13) t (ms) (5.14) δ ( ε )5 · . · − · 12 0.36 . · − · 140 5.1 . · − · 750 30 . · − Table 3: Compute times (in milliseconds) for an update of δ ( ε ) -bound for different values of n using the summation (5.13) and the inner product (5.14) and the δ ( ε ) -upper bound after k = 500 compositions. We see that using Lemma 14 we can speed up the update more than 20-fold, andthat accurate update of δ ( ε ) is possible in less than one millisecond. The alternatives (5.13)and (5.14) give equal results up to machine precision. We compare experimentally the proposed accountant to the Tensorflow moments accoun-tant (Abadi et al., 2016) which is based on the Rényi differential privacy (Mironov, 2017)and naturally allows evaluation of privacy guarantees for heterogeneous mechanisms. We consider a non-adaptive composition of the form M ( X ) = (cid:0) M ( X ) , (cid:102) M ( X ) , . . . , M k − ( X ) , (cid:102) M k ( X ) (cid:1) , where each M i is a Gaussian mechanism with sensitivity 1, and each (cid:102) M i is a randomisedresponse mechanism with probability of a correct answer p , < p < . We know that for therandomised response the PLD leading to the worst-case bound is given by (Koskela et al.,2020b) ω R ( s ) = p · δ c p ( s ) + (1 − p ) · δ − c p ( s ) , where c p = log p − p . Also, for the PLD ω G of the Gaussian mechanism we know that (Sommeret al., 2019) ω G ∼ N (cid:18) σ , σ (cid:19) . Let the ∆ x -grid be defined as above, i.e., let L > , n ∈ Z + , ∆ x = 2 L/n and s i = − L + i ∆ x for all i ∈ Z . Define ω G , max ( s ) = (cid:88) n − i =0 c + i · δ s i ( s ) , (6.1)where c + i = ∆ x · max s ∈ [ s i − ,s i ] ω G ( s ) . Using a bound for the moment generating function of the infinitely extending counterpart of ω max and by using Alg. 1 (we refer to the Appendix for more details) we obtain a numerical alue δ max ( ε ) (depending on n and L ) for which we have that δ ( ε ) ≤ δ max ( ε ) , where δ ( ε ) gives a tight bound for the composition M ( X ) . As a comparison, in Figure 1we also show the guarantees given by Tensorflow moments accountant. We know that for α > , the α -RDP of the randomised response is given by α − (cid:0) p α (1 − p ) − α + (1 − p ) α p − α (cid:1) and correspondingly for the Gaussian mechanism by α/ σ (Mironov, 2017). As is commonlydone, we evaluate RDPs for integer values and sum up them along the compositions. Then,using the moments accountant method the corresponding ( ε, δ ) -bounds are obtained (Abadiet al., 2016). k () TF MA, = 2.0FA, = 2.0TF MA, = 4.0FA, = 4.0 Figure 1: Bounds for δ ( ε ) computed using Algorithm 1 (FA) and Tensorflow moments accountant(TF MA), when σ = 5 . and p = 0 . , for ε = 2 . , . . We see that when δ ∈ [10 − , − ] , FAallows approximately . times as many compositions as TF MA for the same ε . We use here L = 10 and n = 10 discretisation points, however note that already n = 5 · − gives accurateresults. 17 .2 Heterogeneous Subsampled Gaussian Mechanism We next show how to compute upper bounds for heterogeneous compositions of the sub-sampled Gaussian mechanism. We consider the Poisson subsampling and ∼ R -neighbouringrelation. For a subsampling ratio q and noise level σ , the continuous PLD is given by (Koskelaet al., 2020a) ω ( s ) = (cid:40) f ( g ( s )) g (cid:48) ( s ) , if s > log(1 − q ) , , otherwise , where f ( t ) = 1 √ πσ [ q e − ( t − σ + (1 − q )e − t σ ] and g ( s ) = σ log (cid:18) e s − (1 − q ) q (cid:19) + 12 . Again, by deriving a bound for the moment generating function of the infinitely extendingcounterpart of ω max (see (6.1), details in the Appendix) and by using Alg. 1 and Thm. 8 weobtain a numerical value δ max ( ε ) such that after k compositions, δ ( ε ) ≤ δ max ( ε ) , where δ ( ε ) gives a tight bound for the k -fold composition of heterogeneous subsampledGaussian mechanisms. Figure 2 illustrates the upper bound δ max ( ε ) as k grows, when L = 10 and n = 10 . For comparison, we also show the numerical values given by Tensorflowmoments accountant (Abadi et al., 2016). Table 4 illustrates compute times for differentvalues of n . n t (sec) δ ( ε )5 · . · − · . · − · . · − · . · − Table 4: Compute times for computing δ ( ε ) -upper bound using FA for different values of n after k = 5500 compositions. TF MA evaluates an upper bound δ ( ε ) ≈ . · − in 0.035 seconds.Here ε = 1 . , q = 0 . and σ decreases from . to . .18 000 2000 3000 4000 5000Number of compositions k () TF MA, q = 0.02, = (3.0, , 2.0)FA, q = 0.02, = (3.0, , 2.0)TF MA, q = 0.01, = (3.0, , 2.5)FA, q = 0.01, = (3.0, , 2.5) Figure 2: Bounds for δ ( ε ) computed using Algorithm 1 (FA) and Tensorflow moments accountant(TF MA). In the first option ε = 1 . , q = 0 . and σ decreases linearly from . to . . In thesecond option ε = 1 . , q = 0 . and σ decreases linearly from . to . . For each value of σ ,500 compositions are evaluated. We see that when δ ∈ [10 − , − ] , FA allows approximately . times as many compositions as TF MA for the same guarantees. We have extended the Fast Fourier Transform-based approach for computing tight privacybounds for discrete-valued mechanisms to heterogeneous compositions. We have completedthe error analysis of the method such that using the derived bounds it is possible to determineappropriate values for all the parameters of the algorithm, allowing more black-box like usage.The error analysis also led to a bound for the computational complexity of the algorithm thatis slightly better than the existing theoretical complexity bound for obtaining ( ε, δ ) -boundsfor non-adaptive compositions within a given error tolerance. Using the Plancherel theorem,we have shown how to further speed up the evaluation of DP bounds. We believe this givestools to implement tight privacy accountants to services that require minimal delays. Weemphasise that due to the construction of the algorithm and to the rigorous error analysis,the reported ( ε, δ ) -bounds are strict upper privacy bounds. cknowledgements This work has been supported by the Academy of Finland [Finnish Center for ArtificialIntelligence FCAI and grant 325573] and by the Strategic Research Council at the Academyof Finland [grant 336032]. Bibliography Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L.(2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSACConference on Computer and Communications Security , pages 308–318.Agarwal, N., Suresh, A. T., Yu, F. X. X., Kumar, S., and McMahan, B. (2018). cpSGD:Communication-efficient and differentially-private distributed SGD. In Advances in NeuralInformation Processing Systems , pages 7564–7575.Balle, B., Barthe, G., and Gaboardi, M. (2018). Privacy amplification by subsampling: Tightanalyses via couplings and divergences. In Advances in Neural Information ProcessingSystems , pages 6277–6287.Canonne, C., Kamath, G., and Steinke, T. (2020). The discrete gaussian for differentialprivacy. In Advances in Neural Information Processing Systems .Cooley, J. W. and Tukey, J. W. (1965). An algorithm for the machine calculation of complexFourier series. Mathematics of computation , 19(90):297–301.Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivityin private data analysis. In Proc. TCC 2006 , pages 265–284.Girgis, A. M., Data, D., Diggavi, S., Kairouz, P., and Suresh, A. T. (2020). Shuffled modelof federated learning: Privacy, communication and accuracy trade-offs. arXiv preprintarXiv:2008.07180 .Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz,K., Charles, Z., Cormode, G., Cummings, R., et al. (2019). Advances and open problemsin federated learning. arXiv preprint arXiv:1912.04977 .Koskela, A., Jälkö, J., and Honkela, A. (2020a). Computing tight differential privacy guar-antees using FFT. In The 23rd International Conference on Artificial Intelligence andStatistics .Koskela, A., Jälkö, J., Prediger, L., and Honkela, A. (2020b). Tight approximate differentialprivacy for discrete-valued mechanisms using FFT. arXiv preprint arXiv:2006.07134 .Meiser, S. and Mohammadi, E. (2018). Tight on budget?: Tight bounds for r-fold ap-proximate differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference onComputer and Communications Security , pages 247–264. ACM. ironov, I. (2012). On significance of the least significant bits for differential privacy. In Proceedings of the 2012 ACM conference on Computer and communications security , pages650–661.Mironov, I. (2017). Rényi differential privacy. In , pages 263–275.Mironov, I., Talwar, K., and Zhang, L. (2019). Rényi differential privacy of the sampledGaussian mechanism. arXiv preprint arXiv:1908.10530 .Murtagh, J. and Vadhan, S. (2018). The complexity of computing the optimal compositionof differential privacy. Theory of Computing , 14(8):1–35.Sommer, D. M., Meiser, S., and Mohammadi, E. (2019). Privacy loss classes: The centrallimit theorem in differential privacy. Proceedings on Privacy Enhancing Technologies ,2019(2):245–269.Stockham Jr, T. G. (1966). High-speed convolution and correlation. In Proceedings of theApril 26-28, 1966, Spring joint computer conference , pages 229–233. ACM.Stoer, J. and Bulirsch, R. (2013). Introduction to numerical analysis , volume 12. SpringerScience & Business Media.Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint , vol-ume 48. Cambridge University Press.Wang, Y.-X., Balle, B., and Kasiviswanathan, S. P. (2019). Subsampled Rényi differentialprivacy and analytical moments accountant. In The 22nd International Conference onArtificial Intelligence and Statistics , pages 1226–1235.Zhu, Y. and Wang, Y.-X. (2019). Poisson subsampled Rényi differential privacy. In Inter-national Conference on Machine Learning , pages 7634–7642. Proof for Theorem 4 Theorem 4 of shows that the tight ( ε, δ ) -bounds for compositions of non-adaptive mechanismsare obtained using convolutions of PLDs (see also Thm. 1 by Sommer et al. (2019)). Usingour formalism, we can prove the theorem using the proof of Theorem A.4 of (Koskela et al.,2020b) which considers a composition of two mechanisms. We include the proof here forcompleteness. Theorem A.1. Consider a non-adaptive composition of k independent mechanisms M , . . . , M k and neighbouring data sets X and Y . The composition is tightly ( ε, δ ) -DP for δ ( ε ) given by δ ( ε ) = max { δ X/Y ( ε ) , δ Y/X ( ε ) } , where δ X/Y ( ε ) = 1 − k (cid:89) (cid:96) =1 (1 − δ X/Y,(cid:96) ( ∞ )) + (cid:90) ∞ ε (1 − e ε − s ) (cid:0) ω X/Y, ∗ · · · ∗ ω X/Y,k (cid:1) ( s ) d s,δ X/Y,(cid:96) ( ∞ ) = (cid:88) { t i : P ( M (cid:96) ( X )= t i ) > , P ( M (cid:96) ( Y )= t i )=0 } P ( M (cid:96) ( X ) = t i ) (A.1) and ω X/Y, ∗ · · · ∗ ω X/Y,k denotes the convolution of the density functions ω X/Y,(cid:96) , ≤ (cid:96) ≤ k .An analogous expression holds for δ Y/X ( ε ) .Proof. We show the proof first for the composition of two mechanisms. It will be clear fromthe proof how to generalise for a non-adaptive composition of k mechanisms. We start byconsidering Lemma 4 of (Koskela et al., 2020b) that gives an expression for the tight ( ε, δ ) -DP bound for a single mechanism. By definition of the privacy loss distribution, the PLDdistribution (cid:101) ω of the non-adaptive composition of mechanisms M and M is given by (cid:101) ω X/Y ( s ) = (cid:88) ( t i ,t (cid:48) i )=( t j ,t (cid:48) j ) P (cid:0) ( M ( X ) , M ( X ) = ( t i , t (cid:48) i ) (cid:1) · δ (cid:101) s i ( s ) , (cid:101) s i = log (cid:32) ( M ( X ) , M ( X )) = ( t i , t (cid:48) i )( M ( Y ) , M ( Y ) = ( t j , t (cid:48) j ) (cid:33) . Due to the independence of M and M , P (cid:0) M ( X ) = t i , M ( X ) = t (cid:48) i (cid:1) = P (cid:0) M ( X ) = t i (cid:1) P (cid:0) M ( X ) = t (cid:48) i (cid:1) , P (cid:0) M ( Y ) = t j , M ( Y ) = t (cid:48) j (cid:1) = P (cid:0) M ( Y ) = t j (cid:1) P (cid:0) M ( Y ) = t (cid:48) j (cid:1) . (A.2)Therefore, log (cid:32) P (cid:0) M ( X ) = t i , M ( X ) = t (cid:48) i (cid:1) P (cid:0) M ( Y ) = t j , M ( Y ) = t (cid:48) j (cid:1) (cid:33) = log (cid:32) P (cid:0) M ( X ) = t i (cid:1) P (cid:0) M ( Y ) = t j (cid:1) (cid:33) + log (cid:32) P (cid:0) M ( X ) = t (cid:48) i (cid:1) P (cid:0) M ( Y ) = t (cid:48) j (cid:1) (cid:33) . and (cid:101) ω X/Y ( s ) = (cid:88) ( t i ,t (cid:48) i )=( t j ,t (cid:48) j ) P (cid:0) M ( X ) = t i (cid:1) P (cid:0) M ( X ) = t (cid:48) i (cid:1) · δ s i + s (cid:48) i ( s ) , (A.3) here s i = log (cid:32) P (cid:0) M ( X ) = t i (cid:1) P (cid:0) M ( Y ) = t j (cid:1) (cid:33) , s (cid:48) i = log (cid:32) P (cid:0) M ( X ) = t (cid:48) i (cid:1) P (cid:0) M ( Y ) = t (cid:48) j (cid:1) (cid:33) . We see from (A.3) that (cid:101) ω X/Y = ω X/Y ∗ ω X (cid:48) /Y (cid:48) with discrete convolution ∗ as definedin eq. (4.8). The expression for (cid:101) δ X/Y ( ∞ ) follows directly from its definition in Lemma 4of (Koskela et al., 2020b) that gives an expression for the tight ( ε, δ ) -DP bound for a singlemechanism, and from the independence of the mechanisms (A.2). We see from this proof andfrom the definition of the discrete convolution (eq. (4.8)) that the result directly generalisesfor a non-adaptive composition of k mechanisms. B Proof for Lemma 6 Lemma B.1. Let ω and ω be of the form (4.7) , such that s i = − L + i ∆ x , ≤ i ≤ n − ,where L > , n is even and ∆ x = 2 L/n . Define a = (cid:2) a . . . a n − (cid:3) T , b = (cid:2) b . . . b n − (cid:3) T ,D = (cid:104) I n/ I n/ (cid:105) ∈ R n × n . Then, ( (cid:101) ω (cid:126) (cid:101) ω )( s ) = (cid:88) n − i =0 c i · δ s i ( s ) , where c i = (cid:2) D F − (cid:0) F ( D a ) (cid:12) F ( D b ) (cid:1)(cid:3) i , and (cid:12) denotes the element-wise product of vectors.Proof. Assume n is even and s i = − L + i ∆ x , ≤ i ≤ n − . From the the truncation andperiodisation it follows that (cid:101) ω (cid:126) (cid:101) ω is of the form ( (cid:101) ω (cid:126) (cid:101) ω )( s ) = n − (cid:88) i =0 c i · δ s i ( s ) , c i = n/ − (cid:88) j = n/ a j b i − j (indices modulo n ) . (B.1)Denoting (cid:101) a = D a and (cid:101) b = D b , we see that the coefficients c i in (B.1) are given by theexpression c i + n/ = n − (cid:88) j =0 (cid:101) a j (cid:101) b i − j (indices modulo n ) , to which we can apply DFT and the convolution theorem (Stockham Jr, 1966). Thus, when ≤ i ≤ n − , c i + n/ = (cid:104) F − (cid:0) F ( (cid:101) a ) (cid:12) F ( (cid:101) b ) (cid:1)(cid:105) i = (cid:2) F − (cid:0) F ( D a ) (cid:12) F ( D b ) (cid:1)(cid:3) i , (indices modulo n ) (B.2) here (cid:12) denotes the elementwise product of vectors. From (B.2) we find that c i = (cid:2) D F − (cid:0) F ( D a ) (cid:12) F ( D b ) (cid:1)(cid:3) i , (indices modulo n ) . C Decomposition of the Total Error When carrying out the approximations for δ ( ε ) , we1. First replace the PLDs ω , . . . , ω k by the right grid approximations ω R1 , . . . , ω R k . Usingthe notation given in the main text, this corresponds to the approximation δ ( ε ) ≈ δ R ( ε ) ,i.e. to the approximation ∞ (cid:90) L (1 − e ε − s )( ω ∗ · · · ∗ ω k )( s ) d s ≈ ∞ (cid:90) L (1 − e ε − s )( ω R1 ∗ · · · ∗ ω R k )( s ) d s. 2. Then, Algorithm 1 is used to approximate δ R ( ε ) ≈ (cid:102) δ R ( ε ) which in exact arithmeticcorresponds to the approximation ∞ (cid:90) L (1 − e ε − s )( ω R1 ∗ · · · ∗ ω R k )( s ) d s ≈ ∞ (cid:90) L (1 − e ε − s )( (cid:101) ω R1 (cid:126) · · · (cid:126) (cid:101) ω R k )( s ) d s, where (cid:101) ω i ’s denote the periodised PLD distributions and (cid:126) denotes the truncated con-volutions (described in the main text).We separately consider the errors arising from the periodisation and truncation of the con-volutions and from the grid approximation. This means that we bound the total error as (cid:12)(cid:12)(cid:12) δ ( ε ) − (cid:102) δ R ( ε ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) δ ( ε ) − δ R ( ε ) + δ R ( ε ) − (cid:102) δ R ( ε ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12) δ ( ε ) − δ R ( ε ) (cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) δ R ( ε ) − (cid:102) δ R ( ε ) (cid:12)(cid:12)(cid:12) Theorem 9 of the main text gives a bound for the term (cid:12)(cid:12) δ ( ε ) − δ R ( ε ) (cid:12)(cid:12) and Theorems 7 and 8of the main text give bounds for the term (cid:12)(cid:12)(cid:12) δ ( ε ) − (cid:101) δ ( ε ) (cid:12)(cid:12)(cid:12) , in terms of the moment generatingfunctions (MGFs) of ω , . . . , ω k and − ω , . . . , − ω k . The bounds for the error (cid:12)(cid:12)(cid:12) δ ( ε ) − (cid:101) δ ( ε ) (cid:12)(cid:12)(cid:12) can be directly used to bound the error (cid:12)(cid:12)(cid:12) δ R ( ε ) − (cid:102) δ R ( ε ) (cid:12)(cid:12)(cid:12) , either by numerically evaluating theMGFs of the PLDs ω R1 , . . . , ω R k , or by using MGFs of the PLDs ω , . . . , ω k and Lemma 7of (Koskela et al., 2020b), which states that when < λ < (∆ x ) − , E [e − λω R ] ≤ E [e − λω ] and E [e λω R ] ≤ − λ ∆ x E [e λω ] , where ∆ x = 2 L/n . Proof for Theorem 7 We next give a proof for Theorem 7 of the main text. Recall: denote the logarithms of themoment generating functions of the PLDs as α + i ( λ ) = log (cid:16) E [e λω i ] (cid:17) , α − i ( λ ) = log (cid:16) E [e − λω i ] (cid:17) , where ≤ i ≤ k . Futhermore, denote α + ( λ ) = (cid:88) i α + i ( λ ) , α − ( λ ) = (cid:88) i α − i ( λ ) . Using the Chernoff bound, we obtain the required using α + ( λ ) and α − ( λ ) . Theorem D.1. We have the following bound for the periodisation error: L (cid:90) ε | ( ω (cid:126) · · · (cid:126) ω k − (cid:101) ω (cid:126) · · · (cid:126) (cid:101) ω k )( s ) | d s ≤ (cid:0) e α + ( λ ) + e α − ( λ ) (cid:1) e − Lλ − e − Lλ . Proof. Let ω i ’s and the corresponding L -periodic continuations (cid:101) ω i ( s ) be of the form ω i ( s ) = (cid:88) i a ij · δ s j ( s ) and (cid:101) ω i ( s ) = (cid:88) j (cid:101) a ij · δ s j ( s ) , where s j = j ∆ x and a ij , (cid:101) a ij ≥ . By definition of the truncated convolution (cid:126) , ( (cid:101) ω (cid:126) · · · (cid:126) (cid:101) ω k )( s ) = (cid:88) − L ≤ s j D.2. Notice that in case s j ∈ [ − L, L ] for all j , ≤ j ≤ n − , then ( ω (cid:126) · · · (cid:126) ω k )( s ) = ( ω ∗ · · · ∗ ω k )( s ) , i.e., the error arising from the truncation of discrete convolutions vanishes and Lemma D.1gives the total error arising from periodisation and truncation operations when s j ∈ [ − L, L ] for all j . E Proof for Theorem 8 We next give a proof for Theorem 8 of the main text. We first bound the truncation error L (cid:90) ε | ( ω ∗ · · · ∗ ω k − ω (cid:126) · · · (cid:126) ω k )( s ) | d s in terms of the moment generating function of ω . By adding the periodisation error ofTheorem 7 to the resulting bound, we then arrive at the claim of Theorem 8. We emphasise that the following result applies to the case where the supports of the PLDdistributions ω i are outside of the interval [ − L, L ] . Theorem E.1. Let ω i ’s, α + i ( λ ) ’s and α − i ( λ ) ’s be defined as above. For all λ > , we havethat I ( L ) = L (cid:90) ε ( ω ∗ · · · ∗ ω k − ω (cid:126) · · · (cid:126) ω k )( s ) d s ≤ (cid:18) e k max i α + i ( λ ) − e max i α + i ( λ ) e max i α + i ( λ ) − k max i α − i ( λ ) − e max i α − i ( λ ) e max i α − i ( λ ) − (cid:19) e − Lλ . Proof. By adding and subtracting ( ω ∗ · · · ∗ ω k − ) (cid:126) ω k , we may write ω ∗ · · · ∗ ω k − ω (cid:126) · · · (cid:126) ω k =( ω ∗ · · · ∗ ω k − ) ∗ ω k − ( ω ∗ · · · ∗ ω k − ) (cid:126) ω k + ( ω ∗ · · · ∗ ω k − − ω (cid:126) · · · (cid:126) ω k − ) (cid:126) ω k . (E.1) et (cid:96) ∈ Z + . Let ω (cid:96) be of the form ω (cid:96) ( s ) = (cid:88) j a (cid:96)j · δ s j ( s ) and let the convolution ω ∗ · · · ∗ ω (cid:96) − be of the form ( ω ∗ · · · ∗ ω (cid:96) − )( s ) = (cid:88) j c j · δ s j ( s ) for some a (cid:96)j , c j ≥ , s j = j ∆ x . From the definition of the operators ∗ and (cid:126) it follows that (cid:0) ( ω ∗ · · · ∗ ω (cid:96) − ) ∗ ω (cid:96) − ( ω ∗ · · · ∗ ω (cid:96) − ) (cid:126) ω (cid:96) (cid:1) ( s )= (cid:88) j (cid:16) (cid:88) j c j a j − j (cid:17) · δ s j ( s ) − (cid:88) j (cid:16) (cid:88) − L ≤ s j 1) max i α + i ( λ ) e − Lλ + e ( (cid:96) − 1) max i α − i ( λ ) e − Lλ (E.2)for all λ > . The second last inequality follows from the Chernoff bound.Similarly, suppose ω ∗ · · · ∗ ω (cid:96) − − ω (cid:126) · · · (cid:126) ω (cid:96) − is of the form ( ω ∗ · · · ∗ ω (cid:96) − − ω (cid:126) · · · (cid:126) ω (cid:96) − )( s ) = (cid:88) i (cid:101) c i · δ s i ( s ) or some coefficients (cid:101) c i ≥ , where s i = i ∆ x . Then, (cid:90) R (cid:0) ( ω ∗ · · · ∗ ω (cid:96) − − ω (cid:126) · · · (cid:126) ω (cid:96) − ) (cid:126) ω (cid:96) (cid:1) ( s ) d s = (cid:90) R (cid:88) j (cid:16) (cid:88) − L ≤ s j 1) max i α + i ( λ ) e − Lλ + e ( k − 1) max i α − i ( λ ) e − Lλ + (cid:90) R ( ω ∗ · · · ∗ ω k − − ω (cid:126) · · · (cid:126) ω k − ω )( s ) d s. (E.4)Using (E.4) recursively, we see that for all λ > , we have L (cid:90) ε ( ω ∗ · · · ∗ ω k − ω (cid:126) · · · (cid:126) ω k )( s ) d s ≤ k − (cid:88) (cid:96) =1 e (cid:96) max i α + i ( λ ) e − Lλ + k − (cid:88) (cid:96) =1 e (cid:96) max i α − i ( λ ) e − Lλ = (cid:18) e k max i α + i ( λ ) − e max i α + i ( λ ) e max i α + i ( λ ) − k max i α − i ( λ ) − e max i α − i ( λ ) e max i α − i ( λ ) − (cid:19) e − Lλ . .1 Proof for Theorem 9 We next give a proof for Theorem 9 which gives a bound for the grid approximation error. Theorem E.2. The discretisation error δ R ( ε ) − δ ( ε ) can be bounded as δ R ( ε ) − δ ( ε ) ≤ k ∆ x (cid:0) P ( ω + · · · + ω k ≥ ε ) − δ ( ε ) (cid:1) . Proof. From the definition of the discrete convolution we see that ( ω ∗ · · · ∗ ω k )( s ) = (cid:88) i ,...,i k a i · · · a ki k · δ s i + ... + s kik ( s ) and that δ ( ε ) = (cid:90) ∞ ε (1 − e ε − s ) ( ω ∗ · · · ∗ ω k )( s ) d s = (cid:88) { i ,...,i k : s i + ... + s kik ≥ ε } a i · · · a ki k · (1 − e ε − s i − ... − s kik ) Then, since for a ≤ b : exp( b ) − exp( a ) ≤ exp( b )( b − a ) , we have that δ R ( ε ) − δ ( ε ) = (cid:88) { i ,...,i k : s i + ... + s kik ≥ ε } a i · · · a ki k · (e ε − s i − ... − s kik − e ε − s R , i − ... − s R ,kik ) ≤ (cid:88) { i ,...,i k : s i + ... + s kik ≥ ε } a i · · · a ki k · (cid:0) ( s R , i − s i ) + . . . + ( s R ,ki k − s ki k ) (cid:1) · e ε − s i − ... − s kik . Since s R ,(cid:96)i − s (cid:96)i ≤ ∆ x for all i and (cid:96) , we have that δ R ( ε ) − δ ( ε ) ≤ k ∆ x (cid:88) { i ,...,i k : s i + ... + s kik ≥ ε } a i · · · a ki k · e ε − s i − ... − s kik = k ∆ x (cid:0) (cid:88) { i ,...,i k : s i + ... + s kik ≥ ε } a i · · · a ki k − (cid:88) { i ,...,i k : s i + ... + s kik ≥ ε } a i · · · a ki k · (1 − e ε − s i − ... − s kik ) (cid:1) = k ∆ x (cid:0) P ( ω ∗ · · · ∗ ω k ≥ ε ) − δ ( ε ) (cid:1) . Details for the Experiments of Section 6.1 For the PLD ω G of the Gaussian mechanism we know that (Sommer et al., 2019) ω G ∼ N (cid:18) σ , σ (cid:19) . In order to carry out an error analysis for the approximations given in Section 6.1, wedefine the infinite extending grid approximation of ω max . Let L > , n ∈ Z + , ∆ x = 2 L/n and let the grid X n be defined as in (4.2). Define ω max ( s ) = n − (cid:88) i =0 c + i · δ s i ( s ) , where s i = i ∆ x and c + i = ∆ x · max s ∈ [ s i − ,s i ] ω G ( s ) , (F.1)and define ω ∞ max ( s ) = (cid:88) i ∈ Z c + i · δ s i ( s ) . (F.2)To obtain the bounds of Theorem 8 for the compositions (and subsequently strict upperbound for δ ( ε ) ), the error analysis has to be carried out for the distribution ω ∞ max . To thisend, we need bounds for the moment generating functions of − ω ∞ max and ω ∞ max .To show that ω ∞ max indeed leads to an upper bound for δ ( ε ) , we refer to (Koskela et al.,2020b), where this is shown for the compositions of the subsampled Gaussian mechanism.The proof here goes analogously, and we have that for all ε > , δ ( ε ) ≤ δ ∞ max ( ε ) , where δ ∞ max ( ε ) is the tight bound for the composition involving ω ∞ max .To evaluate α + ( λ ) and α − ( λ ) for the upper bound of Theorem 8, we need the momentgenerating functions of − ω ∞ max and ω ∞ max . We have the following bound for ω ∞ max . We notethat E [e λω max ] can be evaluated numerically. Lemma F.1. Let < λ ≤ L and assume σ ≥ and ∆ x ≤ c · L , < c < . The momentgenerating function of ω ∞ max can be bounded as E [e λω ∞ max ] ≤ E [e λω max ] + err( λ, L, σ ) , where err( λ, L, σ ) = exp (cid:18) λ σ (cid:19) (cid:32)(cid:90) − L −∞ (cid:101) ω ( s ) d s + (cid:90) ∞ L − ∆ x (cid:101) ω ( s ) d s (cid:33) , (cid:101) ω ∼ N (cid:18) λ σ , σ (cid:19) . (F.3) roof. The moment generating function of ω ∞ max is given by E [e λω ∞ max ] = (cid:90) L − L e λs ω ∞ max ( s ) d s + (cid:90) − L −∞ e λs ω ∞ max ( s ) d s + (cid:90) ∞ L e λs ω ∞ max ( s ) d s ≤ E [e λω max ] + (cid:90) − L −∞ e λs ω G ( s ) d s + (cid:90) ∞ L − ∆ x e λs ω G ( s ) d s (F.4)We arrive at the claim by observing that for ω G ∼ N (cid:0) σ , σ (cid:1) , (cid:90) − L −∞ e λs ω G ( s ) d s = exp (cid:18) λ σ (cid:19) (cid:90) − L −∞ (cid:101) ω ( s ) d s, where (cid:101) ω ∼ N (cid:0) λ σ , σ (cid:1) and similarly for the second term in (F.3).Using a reasoning analogous to the proof of Lemma F.1, we get the following. We notethat E [e − λω min ] can be evaluated numerically. Corollary F.2. The moment generating function of − ω ∞ max can be bounded as E [e − λω ∞ max ] ≤ E [e − λω max ] + err( λ, L, σ ) , where err( λ, L, σ ) is defined as in (F.3) . Remark F.3. In the experiments, the error term err( λ, L, σ ) was found to be negligible.was found to be negligible.