Differential Privacy for Binary Functions via Randomized Graph Colorings
aa r X i v : . [ c s . I T ] F e b Differential Privacy for Binary Functions viaRandomized Graph Colorings
Rafael G. L. D’Oliveira ∗ , Muriel Médard ∗ and Parastoo Sadeghi †∗ RLE, Massachusetts Institute of Technology, USA † SEIT, University of New South Wales, Canberra, AustraliaEmails: {rafaeld, medard}@mit.edu, [email protected]
Abstract
We present a framework for designing differentially private (DP) mechanisms for binary functions via a graphrepresentation of datasets. Datasets are nodes in the graph and any two neighboring datasets are connected by anedge. The true binary function we want to approximate assigns a value (or true color) to a dataset. Randomized DPmechanisms are then equivalent to randomized colorings of the graph. A key notion we use is that of the boundaryof the graph. Any two neighboring datasets assigned a different true color belong to the boundary.Under this framework, we show that fixing the mechanism behavior at the boundary induces a unique optimalmechanism. Moreover, if the mechanism is to have a homogeneous behavior at the boundary, we present a closedexpression for the optimal mechanism, which is obtained by means of a pullback operation on the optimal mechanismof a line graph. For balanced mechanisms, not favoring one binary value over another, the optimal ( ǫ, δ ) -DP mechanismtakes a particularly simple form, depending only on the minimum distance to the boundary, on ǫ , and on δ . I. I
NTRODUCTION
Since its inception, differential privacy (DP) [1], [2] has become an important privacy-preserving tool insharing information from datasets that contain sensitive information about individuals. A notable applicationof differential privacy was in the 2020 US Census privatization [3], impacting hundreds of millions of people.The definition of differential privacy hinges upon the principle of neighboring datasets – those that differ ina single entry corresponding to one individual or sensitive feature. Roughly speaking, an ( ǫ, δ ) -DP mechanismaims to give the same randomized answer to a query from any two neighboring datasets with probabilitiesthat are within e ǫ multiplicative factor of each other (modulo a small additive constant δ ). Such a definition The work of P. Sadeghi was supported by the Australian Research Council Future Fellowship, FT190100429. of DP is information-theoretic in the sense that it aims to limit the amount of information leakage about anindividual in a dataset to an adversary with unbounded computational power [4]. The relationship between information-theoretic DP and local DP (LDP) [5] with other notions of information-theoretic privacy have been studied. These include conditional mutual information [6] and maximal leakage[7], which are under worst-case source distribution, as well as mutual information [8] and ǫ -log-lift (alsoknown as ǫ -information-privacy or information density) [9]–[11], which assume a given source distribution.It can intuitively be understood that explicit DP conditions on neighboring datasets create topologicalprivacy-preserving conditions into the fabric of the family of datasets of interest. In this paper, we proposeto represent such topological ( ǫ, δ ) -DP conditions on discrete randomized mechanisms using graphs, wherethe vertices represent datasets and edges connect neighboring datasets. In this framework, a DP mechanismis a randomized coloring of the graph, subject to ( ǫ, δ ) -DP conditions. Crucially, we also consider utilityvia a true coloring of the graph, where colors represent true values of the query function performed on adataset. Any two neighboring datasets assigned a different true color belong to the graph boundary. To thebest of our knowledge, a graph-based study of the tension between privacy and utility in the DP frameworkand corresponding optimal design of DP mechanisms is new.As a first step towards a graph-based understating of this problem, we focus on binary functions. Ap-plications include majority queries about voting or survey results, protecting participation of individuals insurveys, or simply crude quantized queries on whether a parameter of interest in a dataset is below or abovea certain threshold. For a survey of applications of DP mechanisms for binary-valued functions see [12].To illustrate, consider a case where three voters privately voted YES or NO to a sensitive matter.Considering all voting outcomes by three unique voters, Fig. 1a shows the true majority function whereblue means the majority voted YES and red means the majority voted NO. However, ignoring unique voters,these eight datasets can be compactly represented by (or collapsed on to) a line graph comprising of fournodes, as in Fig. 1c, where node d means all three voted YES and node d means any two people votedYES while the third voted NO - inversely, for nodes d and d .From a mechanism design perspective, the line graph model for the majority function is much simpler todeal with. For n + 1 individuals, it reduces the complexity from n +1 unique datasets to n + 1) datasets.But one might ask: is there any loss of optimality in doing so? More broadly: is there a systematic andoptimal way for importing or exporting DP mechanisms across different families of datasets?
A. Summary of Results
We illustrate our main results referring to Fig. 2. We are interested in designing an optimal mechanismfor the family of datasets represented by the graph in Fig. 2(a). Here, optimal means the DP mechanism In contrast, computational DP [4] relaxes this requirement and limits the information leakage to an adversary with finite computational power.
111 211112 221121 221122 222 (a) Hide vote value { } { } { }{ , } { , } { , }{ , , } (b) Hide if voted or not d d d d (c) Line graph for (a) or (b) Fig. 1: Different types of neighborhood relations. (a) is explained in the main text. (b) shows an examplewhere nodes represent which one of three individuals {1}, {2}, or {3} voted (voluntarily), whereas colorsrepresent majority outcome assuming voters {1} and {2} always vote blue and {3} always votes red (tiesgo in favor of red). (c) is what we call a (2 , -line graph in Definition 6. In Theorem 1, we show that DPmechanisms in (c) can be transformed into DP mechanisms for both (a) and (b).dominates other mechanisms in terms of probability of truthful response (which we reasonably assumemaximizes some utility function). • We prove in Theorem 2 that if we fix the probability of giving the truthful response for each dataset in { h, ℓ, q, n, i } at the boundary, then there exists at most one optimal DP mechanism that satisfies theseboundary conditions. • In a boundary homogeneous
DP mechanism, only two parameters, m B and m R , specify the probabilityof truthful response at blue and red boundary datasets, respectively. Under this setting, we show throughDefinitions 3, 5, 6, 7 and Theorems 1, 3 that one can apply a color- and boundary- preserving morphismto obtain the line graph in Fig. 2(b) with only two nodes in its boundary, optimally solve the ( ǫ, δ ) -DPmechanism over it, and pull it back to apply to Fig. 2(a), while preserving optimality. • In Theorem 4, we give a closed expression for the optimal ( ǫ, δ ) -DP mechanism for the line graph.Thus, we also obtain a closed expression for the optimal boundary homogeneous DP mechanism, viaTheorem 3. • A mechanism is balanced if it is boundary homogeneous and m B = m R . The optimal balanced ( ǫ, δ ) -DP mechanism takes a very simple form. For any dataset d , the probability P e of giving the incorrectresponse (opposite to its true color) only depends on the shortest path to the nearest dataset of oppositecolor, ∆ , and the privacy parameters, ǫ and δ : P e ( d ) = max (cid:26) e ǫ − − δ ( e ǫ (∆+1) + e ǫ ∆ − e ǫ ∆ ( e ǫ + 1)( e ǫ − , (cid:27) . II. S
ETTING
We denote by D the family of datasets. We consider a symmetric neighborhood relationship in D where d, d ′ ∈ D are said to be neighbors if d ∼ d ′ . We also consider a finite output space V which corresponds tothe space over which the output of the queries lie. In this paper, we consider the case where |V| = 2 andthat, without loss of generality, V = { , } .A randomized mechanism, which we refer to as just a mechanism, is a random function M : D → V . Wedenote the set of all mechanisms of interest by M . In this paper, M is the set of all ( ǫ, δ ) -DP mechanisms. Definition 1.
Let ǫ, δ ∈ R be such that ǫ ≥ and ≤ δ < . Then, a mechanism M : D → V is ( ǫ, δ ) -differentially private if for any d ∼ d ′ and S ⊆ V , we have Pr[ M ( d ) ∈ S ] ≤ e ǫ Pr[ M ( d ′ ) ∈ S ] + δ. For |V| = 2 ( ǫ, δ ) -differential privacy is equivalent to Pr[ M ( d ) = v ] ≤ e ǫ Pr[ M ( d ′ ) = v ] + δ, ∀ v ∈ V . (1)We consider a function f : D → V which we refer to as the true function. The goal is to approximatethe true function f by an ( ǫ, δ ) -differentially private mechanism M . To measure the performance of themechanism, i.e., how good the approximation is, a utility function U : M → R must be defined, where U [ M ] ≥ U [ M ′ ] means that the mechanism M performs better than M ′ . In this work, we do not considera specific utility function, but consider a general family of them. Definition 2.
A utility function U : M → R is reasonable if Pr[ M ( d ) = f ( d )] ≥ Pr[ M ′ ( d ) = f ( d )] forevery d ∈ D implies U [ M ] ≥ U [ M ′ ] . When this condition holds, we say that the mechanism M dominates M ′ . Remark 1.
This notion of reasonable utility is relaxed enough not to impose unnecessary conditions on theapplication, but strong enough to capture some of the utility functions already proposed in the DP literature.The authors in [13] considered a more restrictive notion of utility (negative of a loss function). A lossfunction ℓ : V × V → R was called legal in [13] if for every true function value i ∈ V and mechanismresponse j ∈ V , ℓ ( i, j ) depends only on i and | i − j | and is non-decreasing in | i − j | . This loss functioncan be used in numerical queries to measure the mean absolute error, where ℓ ( i, j ) = | i − j | or the meansquare error, where ℓ ( i, j ) = | i − j | . For categorical queries, by setting ℓ ( i, j ) = 0 for i = j and ℓ ( i, j ) = 1 otherwise, one can measure the average binary loss function or Hamming distortion. Finding the optimal ( ǫ, -LDP mechanism satisfying an upper bound on the expected Hamming distortion was studied in [14]. a b cg hk ℓ d e fi jm n o pq r g ∂ ab cgk hℓ inq dom jpr ef Fig. 2: A graph morphism, which preserves neighboring relations, minimum distance (shortest path) to theboundary, and color.Indeed, for a given true function f , simultaneously maximizing the probability of truthful response acrossall datasets minimizes the expected Hamming distortion function: L [ M ] , X d ∈D Pr( d )(1 − Pr( M ( d ) = f ( d ))) regardless of the distribution on datasets p ( d ) . Therefore U [ M ] = 1 − L ( M ) is a reasonable utility function.The notion of domination in Definition 2 induces a partial order on the set M of all mechanisms. If amechanism M dominates another M ′ then the first one outperforms the second for any reasonable utilityfunction. It is not always the case that two mechanisms can be compared, even when restricted to a reasonableutility. We give an example below. Example 1.
Consider the dataset D = { , } where ∼ and the true function f : D → V is suchthat f (1) = 1 and f (2) = 2 . Let M and M be the (log(2) , . -DP mechanisms defined such that Pr[ M (1) = 1] = 0 . , Pr[ M (2) = 2] = 0 . , Pr[ M (1) = 1] = 0 . , and Pr[ M (2) = 2] = 0 . . Then,neither mechanism dominates the other. The reason for this is that there are reasonable utility functionswhich, for a mechanism M ∈ M might value the output of Pr[ M (1) = 1] more than Pr[ M (2) = 2] , orvice-versa. Extreme cases of this are the reasonable utility functions U [ M ] = Pr[ M (1) = 1] and U ′ [ M ] =Pr[ M (2) = 2] , both disagreeing on which of M or M is better.III. D IFFERENTIAL P RIVACY AS R ANDOMIZED G RAPH C OLORINGS
In this section, we interpret differential privacy as a randomized graph coloring problem. The vertices ofthe graph are the datasets d ∈ D . The edges of the graph are the neighboring relation on the datasets, i.e.two vertices d, d ′ ∈ D have an edge between them if d ∼ d ′ . The graph is then a tuple ( D , ∼ ) , which weoften identify with the set D itself.The following transformation allows us to transport differentially private mechanisms from one setting toanother. In this paper, by log we mean the natural logarithm.
Definition 3.
A morphism from a family of datasets D to another family D is a function g : D → D such that d ∼ d ′ implies in either g ( d ) ∼ g ( d ′ ) or g ( d ) = g ( d ′ ) , for every d, d ′ ∈ D .This notion is weaker than the classic graph homomorphism, which maps adjacent vertices to adjacentvertices, i.e. every graph homomorphism is a morphism, but not every morphism is a graph homomorphism.For example, the mapping from a graph with at least one edge to the graph with a single vertex is a morphism,but cannot be a graph homomorphism.Morphisms allow us to transport mechanisms from the codomain to the domain via a pullback operation. Theorem 1.
Let g : D → D be a morphism between two families of datasets and M : D → V bean ( ǫ, δ ) -DP mechanism on D . Then, the mechanism M : D → V given by the pullback operation M = M ◦ g is ( ǫ, δ ) -DP on D .Proof. Let d, d ′ ∈ D be such that d ∼ d ′ . Then, Pr[ M ( d ) = v ] = Pr[ M ( g ( d )) = v ] ≤ e ǫ Pr[ M ( g ( d ′ )) = v ] + δ = e ǫ Pr[ M ( d ′ ) = v ] + δ, where the inequality follows from either g ( d ) ∼ g ( d ′ ) or g ( d ) = g ( d ′ ) .In Fig. 2, we show a morphism g ∂ between a general graph and a line graph. In Theorem 3, we use thissame kind of morphism to obtain optimal ( ǫ, δ ) -DP mechanisms for a general class of graphs by pullingthem back from optimal ( ǫ, δ ) -DP mechanisms on line graphs.We now incorporate the true function we want to approximate into the graph. The true function f : D → V is equivalent to a coloring of the graph D . We call the triple ( D , ∼ , f ) a colored graph, and often identify itwith D . We call a morphism g : D → D such that f = f ◦ g , a color preserving morphism. An ( ǫ, δ ) -DPmechanism is then a randomized coloring of the graph satisfying constraints related to the edges of thegraph. Example 2.
Consider the dataset D = { , } where vertices are neighbors if they only differ in one entry,and the true function Maj :
D → { , } given by the majority function. If we assign colors to values suchthat Maj( d ) = 1 is blue and Maj( d ) = 2 is red, we obtain the graph in Fig. 1a. The function from Fig. 1ato Fig. 1c such that d , { , , } 7→ d , { , , } 7→ d , and d is a colorpreserving morphism.We define the following topological notions on our graphs. Definition 4.
The blue set is B = { d ∈ D : f ( d ) = 1 } , corresponding to the color blue in our figures. Theinterior of B is the set B o = { d ∈ B : d ∼ d ′ ⇒ d ′ ∈ B } and its boundary is the set ∂B = B − B o . Replacing B by R above, we obtain the analogous red versions of the definitions. When referring to a singlemechanism we denote the probabilities on the output by B d = Pr[ M ( d ) = 1] and R d = Pr[ M ( d ) = 2] .The distance between two points d, d ′ ∈ D is the number of edges in a shortest path connecting them,which we denote by dist( d, d ′ ) . The distance from a point d ∈ D to a subset A ⊆ D is defined as dist( d, A ) =min d ′ ∈ A dist( d, d ′ ) .Thus, if we consider the colored graph on the left of Fig. 2, the blue set is given by B = { a, b, c, g, h, k, ℓ } ,its interior by B o = { a, b, c, g, k } , and its boundary by ∂B = { h, ℓ } . Remark 2.
In this paper, we characterize mechanisms by how they behave on the blue set. Their behavioron the red set can then be derived by using analogous arguments. In general, our statements for the blue setimply in a dual version of them by replacing B with R and vice-versa. The dual of a colored graph is thegraph with colors red and blue swapped.IV. O PTIMAL M ECHANISMS
In this section, we focus on finding optimal ( ǫ, δ ) -DP mechanisms for binary values. In Theorem 2, wecharacterize the optimal mechanism in terms of its values at the boundary. Later, in Theorem 5, we presenta closed form for the optimal mechanism when the values at the boundary satisfy a homogeneity condition. Theorem 2.
Let ( D , ∼ , f ) be a colored graph and m d ∈ [0 , be a fixed value for every d ∈ ∂B . Then,there exists at most one ( ǫ, δ ) -DP mechanism M : D → V such that B d = m d , for every d ∈ ∂B .Proof. We assume the subgraphs B and R are connected. If not, the following argument will hold foreach connected component of B and R . We also assume that there exists at least one mechanism whichsatisfies the ( ǫ, δ ) -DP constraints, otherwise our result trivially follows since their are no maximal ( ǫ, δ ) -DPmechanisms.Let d, d ′ ∈ B be such that d ∼ d ′ . Then, the ( ǫ, δ ) -DP conditions are given by B d ≤ e ǫ B d ′ + δ, (2) B d ′ ≤ e ǫ B d + δ, (3) − B d ≤ e ǫ (1 − B d ′ ) + δ, (4) − B d ′ ≤ e ǫ (1 − B d ) + δ. (5) In what follows and in order to avoid cumbersome notation with max and min functions, every time a probability is less than zero weinterpret it to be zero, and every time it is more than one we interpret it to be one.
Assume, without loss of generality, that B d < B d ′ . Then, (2) and (5) are trivially satisfied. The remainingbounds, (3) and (4), are both upper bounds on B d ′ . Indeed, (4) is equivalent to B d ′ ≤ e ǫ B d + e ǫ + δ − e ǫ . Thus, B d and B d ′ are maximized together. Since B is connected, this implies that all the B d , for d ∈ B o , aremaximized together. Define M to be the mechanism which maximizes all the B d simultaneously, for d ∈ B o ,subject to the constraint that Pr[ M ( d ) = 1] = m d , for every d ∈ ∂B .We now consider the datasets d ∈ R . We note that, since the values at the border d ∈ ∂B are alreadyset, the maximization of the points in B o does not affect the constraints on R d = Pr[ M ( d ) = 2] . Thus, anargument analogous to the one above holds for the set R , i.e., all the R d , for d ∈ R , can be maximizedtogether. Thus, as above, we define M to be the mechanism which maximizes all the R d simultaneously,for d ∈ R . The mechanism M is then optimal.Thus, for every fixed values of B h and B ℓ in the colored graph in the left of Fig. 2, there is either no ( ǫ, δ ) -DP mechanism or there is a unique maximal one. Moreover, the optimal mechanism can be found bysimultaneously maximizing all the values in B x and R y for x ∈ B o and y ∈ R . For example, if B h = 0 . and B ℓ = 0 . , then the optimal (log(2) , . -DP mechanism is such that B a = B b = 1 , B g = B k = 0 . , B c = 0 . , R q = 0 . , R i = R n = 0 . , R m = 0 . , R d = R o = 0 . , and R e = R f = R j = R p = R r = 1 .This can be checked by direct calculation of (1) for all d ∼ d ′ , showing that the (log(2) , . -DP constraintsare tightly satisfied. Another direct calculation shows that there is, however, no (log(1 . , -DP mechanismfor the same boundary conditions.When the mechanism satisfies a homogeneity condition, we are able to find a closed expression, inTheorem 5, for the optimal ( ǫ, δ ) -DP mechanism. This condition, we call boundary homogeneity, imposesthe same probability of giving the truthful response at each same-color dataset of the boundary. Definition 5.
A mechanism M : D → V is boundary homogeneous if, for every d, d ′ ∈ ∂B , it holds that B d = B d ′ .Thus, a mechanism is boundary homogeneous if it acts the same across the boundary. For the votingexample shown in Fig. 1a, a boundary homogeneous mechanism is agnostic to uniqueness of individuals,treating datasets , and the same. In Theorem 3, we show that the optimal boundary homogeneous ( ǫ, δ ) -DP mechanism of any colored graph can be obtained via a pullback of the optimal mechanism on aparticular line graph. Definition 6.
Let n B , n R ∈ N . The ( n B , n R ) -line is the colored graph ( D , ∼ , f ) with datasets D = [1 , n B + n R ] , neighboring relation i ∼ j if | i − j | = 1 , and true function such that f ([1 , n b ]) = 1 and f ([ n b + 1 , n b + n R ]) = 2 . Examples include the (2 , -line in Fig. 1c, the (3 , -line in the right of Fig. 2, and the ( n B , n R ) -line inFig. 3. We are particularly interested in the following ( n B , n R ) -line. Definition 7.
Let ( D , ∼ , f ) be a colored graph and set n B = max d ∈ B ( d, ∂B )+1 and n R = max d ∈ R ( d, ∂R )+1 . Then, the boundary graph of D is the ( n B , n R ) -line denoted by ( D ∂ , ∂ ∼ , f ∂ ) . The boundary morphism isthe color-preserving morphism g ∂ : D → D ∂ which maps d ∈ B to g ∂ ( d ) = n B − dist( d, ∂B ) and d ∈ R to g ∂ ( d ) = n B + 1 + dist( d, ∂R ) .Fig. 2 shows a colored graph on the left and its boundary graph on the right, with the explicit boundarymorphism. Both colored graphs in Figs. 1a and 1b have the (2 , -line in Fig. 1c as their boundary graph.The morphism in Example 2 is a boundary morphism.Our next result shows that the optimal boundary homogeneous ( ǫ, δ ) -DP mechanism of any colored graphcan be obtained via a pullback of the optimal mechanism on its boundary graph. Theorem 3.
Let ( D , ∼ , f ) be a colored graph and denote by M ∂ : D ∂ → V the optimal ( ǫ, δ ) -DP mechanismon its boundary graph. Then, the pullback M = M ∂ ◦ g ∂ is the optimal boundary homogeneous ( ǫ, δ ) -DPmechanism on D .Proof. Let n B and n R be the parameters of the boundary graph, i.e. D ∂ is the ( n B , n R ) -line. By Theorem 2,for each fixed B n B (the probability of truthful response at the blue boundary dataset) there exists a uniquemaximal ( ǫ, δ ) -DP mechanism M ∂ on D ∂ . By Theorem 1, the morphism g ∂ : D → D ∂ induces an ( ǫ, δ ) -DPmechanism on D defined by D ∂ ◦ g ∂ . This mechanism is clearly boundary homogeneous. It follows fromTheorem 2 that there is a unique optimal boundary homogeneous ( ǫ, δ ) -DP mechanism on D . Let M bethis mechanism. We show that M = D ∂ ◦ g ∂ .Let d ∈ B o . Then, since M is optimal on D , it holds that Pr[ M ( d ) = 1] ≥ Pr[ M ∂ ( g ∂ ( d )) = 1] .Let d be the closest dataset to d belonging to ∂B . Let G = { d, d dist( d,∂B ) − , . . . , d } be a set of datasetswhich form a shortest path from d to d . Note that g ∂ | G is injective and thus has a left inverse, which wedenote by h : g ∂ ( G ) → G . Note that h is a morphism and, therefore, by Theorem 1, M ◦ h is an ( ǫ, δ ) -DP mechanism on D ∂ . It follows from Theorem 4 that, since M ∂ is the optimal mechanism on D ∂ , then M ∂ | g ∂ ( G ) is the optimal mechanism on g ∂ ( G ) . Thus, Pr[ M ∂ ( g ∂ ( d )) = 1] ≥ Pr[ M ( d ) = 1] , and, therefore, Pr[ M ( d ) = 1] = Pr[ M ∂ ( g ∂ ( d )) = 1] .An analogous argument holds for the red set R .Thus, finding the optimal boundary homogeneous ( ǫ, δ ) -DP mechanisms for general colored graphs isequivalent to finding them for the ( n B , n R ) -line. In Theorem 4 we present a closed expression for theoptimal ( ǫ, δ ) -DP mechanism on the ( n B , n R ) -line. We represent this mechanism in terms of the probabilityof the points in the blue set B being red as a function of the distance to the boundary ∂B , denoted by n B − τ − n B − τ − n B − τ n B − n B n B + 1 n B + n R Initial Recurrence: R Ini ( n, i ) Terminal Recurrence: R Ter ( n B − τ − , i − τ − Fig. 3: The ( n B , n R ) -line. In Theorem 4, we show that depending on the probability of being red R n B atthe blue boundary node n B , there may be an initial recurrence phase for computing R n B − i for its first τ + 1 adjacent nodes (Definitions 8 and 9). After this possible initial phase, the terminal recurrence in Definition10 determines R n B − i . R n B − i . We show that the mechanism is characterized by two possible behaviors, depending on a transitionparameter, defined as follows. Definition 8.
Let R n B ∈ [0 , and ǫ, δ ∈ R ≥ . Then, the transition parameter is defined as τ = (cid:24) ǫ log (cid:18) e ǫ + 2 δ − − R n B )( e ǫ − e ǫ ) + δ ( e ǫ + e ǫ ) (cid:19)(cid:25) , if ǫ > , and τ = − if ǫ = 0 .The initial behavior occurs when i ≤ τ + 1 . In this case, the probability of the mechanism outputting thecolor red is given by the following function. Definition 9.
The initial recurrence is given by R Ini ( n, i ) = 1 − e ǫi (1 − R n ) − δ ( e iǫ − e ǫ − . The terminal behavior occurs when i > τ + 1 and is given by the following function.
Definition 10.
The terminal recurrence is given by R Ter ( n, i ) = R n e ǫi − δ ( e ǫi − e ǫi ( e ǫ − if ǫ > and R Ter ( n, i ) = R n − iδ , if ǫ = 0 .These functions are obtained by solving the recurrences in the proof of Theorem 4, our next theorem. Inthis theorem, we present a closed form for the optimal ( ǫ, δ ) -DP mechanism on the ( n B , n R ) -line. Theorem 4.
The unique optimal ( ǫ, δ ) -DP mechanism on the ( n B , n R ) -line with Pr[ M ( n B ) = 2] = R n B issuch that R n B − i = R Ini ( n B , i ) if i ≤ τ + 1 ,R Ter ( n B − τ − , i − τ − if τ + 1 < i, for every i ∈ [1 , n B − . Proof.
Consider the ( ǫ, δ ) -DP conditions in (2)-(5) with the substitution R i = 1 − B i . We are interested inminimizing the probability of giving the erroneous answer, R i . Therefore, we consider two lower bounds on R i given by (2) and (5), namely, R i ≥ − e ǫ + e ǫ R i +1 − δ, (6)and R i ≥ R i +1 − δe ǫ . (7)For each i , the largest of these bounds is the optimal choice for R i . If ǫ = 0 , then both bounds are the sameand it is easy to check that the statement of the theorem holds. Thus, we assume ǫ > in for the rest ofthis proof.In Lemma 1, we show that − e ǫ + e ǫ R i +1 − δ > R i +1 − δe ǫ , (8)if and only if, ≤ δ < e ǫ R i +1 + R i +1 − e ǫ . (9)Thus, every time (9) holds, the optimal R i is such that R i = 1 − e ǫ + e ǫ R i +1 − δ , i.e. making (6) anequality. If we were to choose (6) every time we would have the recurrence in Lemma 2, with solution R n B − i = 1 − e iǫ (1 − R n B ) − δ (1 − e iǫ )1 − e ǫ . (10)We find the first i ∈ N for which (9) does not occur. This happens when e ǫ R n B − i − + R n B − i − − e ǫ ≤ δ .Substituting R n B − i − = 1 − e ǫ + e ǫ R n B − i − δ and rearranging, we obtain R n B − i ≤ e ǫ + δe ǫ + 2 δ + e ǫ − e ǫ + e ǫ . (11)Thus, whenever R n B − i satisfies (11), then R n B − i − will not satisfy (9), so that the optimal choice for R n B − i − is equating it to (7). To find the first value such that this happens we substitute R n B − i in (11) with its valuein (10) to obtain − e iǫ (1 − R n B ) − δ (1 − e iǫ )1 − e ǫ ≤ e ǫ + δe ǫ + 2 δ + e ǫ − e ǫ + e ǫ . Solving this for i we obtain i ≥ ǫ log (cid:18) − e ǫ − δ (1 − R n B )( e ǫ − e ǫ ) − δ ( e ǫ + e ǫ ) (cid:19) . Thus, the smallest i for which this occurs is i = τ as per Definition 8 (after multiplying both numerator anddenominator by − ). To recap our argument, R n B − τ satisfies (11) which means that R n B − τ − does not satisfy (9). Thus, theinitial recurrence applies up to R n B − τ − . In other words, R n B − i = R Ini ( n B , i ) for i ≤ τ + 1 .We now prove that for i > τ + 1 , the optimal choice is always (7). We do this by showing that if R i +1 does not satisfy (9), then R i does not either. Indeed, if e ǫ R i +1 + R i +1 − e ǫ ≤ δ , then, since R i +1 does notsatisfy (9), R i = R i +1 − δe ǫ . Thus, e ǫ R i + R i − e ǫ = R i +1 − δ + R i +1 − δe ǫ − e ǫ = e ǫ R i +1 − e ǫ δ + R i +1 − δ − e ǫ ≤ − e ǫ δ ≤ δ. Therefore, for i > τ + 1 , the optimal R n B − i is equating (7). This gives us a recurrence, which by Lemma3, has solution R n B − i = R Ter ( n B − τ − , i − τ − for i > τ + 1 .In the following example, we compute the optimal scheme for the (4 , -line satisfying the boundarycondition R = 0 . . Example 3.
Consider the (4 , -line with the boundary satisfying R = 0 . , and with privacy parameters ǫ = log(1 . and δ = 0 . . Then, τ = 1 , which means that R and R are calculated via the initial recurrenceand R via the terminal one. Performing this calculation we obtain, R = 0 . , R = 0 . , and R = 0 . .We deal with the red set as noted in Remark 2. The dual of the (4 , -line with the boundary satisfying R = 0 . is the (3 , -line with boundary R = 0 . . Then, τ = − , which means that R , R , and R arecalculated via the terminal recurrence. Performing this calculation we obtain, R = 1 / , and R = R = 0 .Thus, in the original (4 , -line, the optimal mechanism satisfies B = 1 / , and B = B = 0 .Combining Theorems 3 and 4 we present a closed form for the optimal boundary homogeneous ( ǫ, δ ) -DPmechanism. Theorem 5.
Let ( D , ∼ ) be a set of datasets with a neighboring relation and n B = max d ∈ B dist( d, ∂B ) + 1 .Then, the optimal boundary homogeneous ( ǫ, δ ) -DP mechanism, M : D → V , is such that, for every d ∈ B o , R d = R Ini ( n B , dist( d, ∂B )) if dist( d, ∂B ) ≤ τ + 1 ,R d = R Ter ( n B − τ − , dist( d, ∂B ) − τ − otherwise . Proof.
Follows directly from combining Theorems 3 and 4.Thus, if we consider the colored graph on the left hand side of Fig. 2 subject to R h = R ℓ = 1 / ,then, the optimal mechanism is such that R a = R b = 0 , R c = R g = R k = 0 , R i = R n = R q = 0 . , R d = R o = R m = 0 . , R j = R p = R r = 0 . , and R e = R f = 0 . . We note that this mechanism canbe obtained by pulling back the optimal mechanism for the (3 , -line in Example 3. We now show that when the probability in the boundary of the blue set is such that the output blue ismore likely, the optimal mechanism depends only on the terminal recurrence.
Corollary 1.
Consider the setting in Theorem 5. If the boundary probability R n B ≤ , then R d = R Ter ( n B , dist( d, ∂B )) ,i.e., for every d ∈ B o , R d = R n B e ǫ dist( d,∂B ) − δ ( e ǫ dist( d,∂B ) − e ǫ dist( d,∂B ) ( e ǫ − . Proof.
This follows from Lemma 4 by substituting x = R n , y = e ǫ , z = δ , and noting that the inequality inthe lemma implies that the optimal bound at each step is given in (7). Alternatively, one can show that, inthis case, τ < .A particular case of boundary homogeneity is when the mechanism gives no preference for blue or redat the boundary. Definition 11.
A mechanism M : D → V is balanced (or fair) if B d = R d ′ for every d ∈ ∂B and d ′ ∈ ∂R .In the case of balanced mechanisms, the optimal mechanism takes the following simple form. Corollary 2.
The optimal balanced ( ǫ, δ ) -DP mechanism is such that, for every d ∈ B o , R d = e ǫ − − δ ( e ǫ (dist( d,∂B )+1) + e ǫ dist( d,∂B ) − e ǫ dist( d,∂B ) ( e ǫ + 1)( e ǫ − . Proof.
Let x = B d = R d ′ for every d ∈ ∂B and d ′ ∈ ∂R . Then, the ( ǫ, δ ) -DP conditions are equivalentto x ≤ e ǫ (1 − x ) + δ and (1 − x ) ≤ e ǫ x + δ , of which only the first equation gives an upper bound on x .Maximizing x , we obtain B d = x = e ǫ + δ e ǫ which implies in the boundary R d = − δ e ǫ , for every d ∈ ∂B .Since R d ≤ , the result follows from Corollary 1.Thus, if we consider the voting example in Fig. 1a, the optimal balanced (log(2) , . -DP mechanism issuch that R = 0 . , R = R = R = 0 . , R = R = R = 0 . , and R = 0 . .A PPENDIX
In this Appendix, we prove Lemmas 1 through 4 used in the results of the main text. To apply them tothe main results we generally substitute the variable x by the probability R n , the variable y by the privacyparameter ǫ , and the variable z by the privacy parameter δ .The first lemma we present shows the conditions under which (8) is true in Theorem 4. Lemma 1.
Let x, y, z ∈ R be such that ≤ x ≤ , y ≥ , and ≤ z < . Then, − y + xy − z > x − zy (12) if and only if x = 1 , y > , or z < xy + x − y . Proof. If x = 1 , then (12) takes the form − z > − zy . Since y ≥ , this is equivalent to y (1 − z ) > − z .But this occurs if and only if y > and ≤ z < .If x = 1 , then (12) is equivalent to y − y + xy − zy > x − z . Putting all terms on one side and dividingby ( x − < we obtain y − y + xy − zy − x + zx − < , which can be factored into ( y − (cid:18) y − x − z − x (cid:19) < . Since y > , this is equivalent to y < x − z − x , which is equivalent to z < xy + x − y .The next lemma solves the recurrence in Theorem 4 used to define the initial recurrence, R Ini , in Definition9.
Lemma 2.
Let n ∈ N and let x , . . . , x n , y, z ∈ R such that y > satisfy x i = 1 + yx i +1 − y − z for every i < n. (13) Then, for every i < n , x i = 1 + y n − i ( x n − − z (1 + y + . . . + y n − i − )= 1 + y n − i ( x n − − z (1 − y n − i )1 − y . (14) Proof.
Since (13) is a linear recursion it has a unique solution, which can be verified by substituting (14)in (13).The following lemma solves the recurrence in Theorem 4 used to define the terminal recurrence, R Ter , inDefinition 10.
Lemma 3.
Let n ∈ N and let x , . . . , x n , y, z ∈ R such that y > satisfy x i = x i +1 − zy for every i < n. (15) Then, for every i < n , x i = x n − z (1 + y + . . . + y n − i − ) y n − i = x n y n − i − z (1 − y n − i )(1 − y ) y n − i (16) Proof.
Since (15) is a linear recursion, it has a unique solution which can be verified by substituting (16)in (15).Our final lemma is used in the proof of Corollary 1. Lemma 4.
Let x, y, z ∈ R be such that x ≤ , y ≥ , and z ≥ . Then, it holds that − y + xy − z ≤ x − zy . (17) Proof.
Since x ≤ / and y ≥ , it follows that xy + x ≤ y +12 . But y ≥ implies in y +12 ≤ y . Thus, xy + x ≤ y . We rewrite this as ≤ (cid:0) y − x − x (cid:1) . Since y ≥ it follows that ≤ (cid:0) y − x − x (cid:1) ( y − .Expanding this equation we obtain y + xy ≤ x + y . Now, since y ≥ and z ≥ , it follows that z ≤ zy .Thus, y + xy + z ≤ x + y + zy . Rearranging this equation, we obtain (17).R EFERENCES [1] C. Dwork, “Differential Privacy,” in
Automata ,Languages and Programming , 2006.[2] C. Dwork, A. Roth et al. , “The algorithmic foundations of differential privacy,”
Foundations and Trends® in Theoretical Computer Science ,vol. 9, no. 3–4, pp. 211–407, 2014.[3]
Disclosure Avoidance and the 2020 Census
Advances in Cryptology (CRYPTO) , 2009, pp.43–54.[5] S. Kasiviswanathan, H. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, “What can we learn privately?”
SIAM Journal on Computing ,vol. 40, no. 3, p. 793–826, 2011.[6] P. Cuff and L. Yu, “Differential privacy as a mutual information constraint,” in
CSS , 2016, pp. 43–54.[7] I. Issa, A. B. Wagner, and S. Kamath, “An operational approach to information leakage,” vol. 66, no. 3, pp. 1625–1657, Mar. 2020.[8] W. Wang, L. Ying, and J. Zhang, “On the relation between identifiability, differential privacy, and mutual-information privacy,”
IEEETransactions on Information Theory , vol. 62, no. 9, pp. 5018–5029, 2016.[9] H. Hsu, S. Asoodeh, and F. P. Calmon, “Information-theoretic privacy watchdogs,” Paris, France, 2019, pp. 552–556.[10] P. Sadeghi, N. Ding, and T. Rakotoarivelo, “On properties and optimization of information-theoretic privacy watchdog,” 2020. [Online].Available: https://arxiv.org/abs/2010.09367[11] F. du Pin Calmon and N. Fawaz, “Privacy against statistical inference,” Monticello, IL, 2012, pp. 1401–1408.[12] N. Holohan, D. J. Leith, and O. Mason, “Optimal differentially private mechanisms for randomised response,”
IEEE Transactions onInformation Forensics and Security , vol. 12, no. 11, pp. 2726–2735, 2017.[13] A. Ghosh, T. Roughgarden, and M. Sundararajan, “Universally utility-maximizing privacy mechanisms,”
SIAM Journal on Computing ,vol. 41, no. 6, pp. 1673–1693, 2012.[14] K. Kalantari, L. Sankar, and A. D. Sarwate, “Robust privacy-utility tradeoffs under differential privacy and hamming distortion,”