A Linear Reduction Method for Local Differential Privacy and Log-lift
aa r X i v : . [ c s . I T ] J a n A Linear Reduction Method for Local DifferentialPrivacy and Log-lift
Ni Ding ∗ , Yucheng Liu † and Farhad Farokhi ∗∗ The University of Melbourne (email: { ni.ding, farhad.farokhi } @unimelb.edu.au). † The University of Newcastle (email: { yucheng.liu } @newcastle.edu.au). Abstract —This paper considers the problem of publishing data X while protecting the correlated sensitive information S . Wepropose a linear method to generate the sanitized data Y withthe same alphabet Y = X that attains local differential privacy(LDP) and log-lift at the same time. It is revealed that both LDPand log-lift are inversely proportional to the statistical distancebetween conditional probability P Y | S ( x | s ) and marginal proba-bility P Y ( x ) : the closer the two probabilities are, the more private Y is. Specifying P Y | S ( x | s ) that linearly reduces this distance | P Y | S ( x | s ) − P Y ( x ) | = (1 − α ) | P X | S ( x | s ) − P X ( x ) | , ∀ s, x for some α ∈ (0 , , we study the problem of how to generate Y from theoriginal data S and X . The Markov randomization/sanitizationscheme P Y | X ( x | x ′ ) = P Y | S,X ( x | s, x ′ ) is obtained by solvinglinear equations. The optimal non-Markov sanitization, thetransition probability P Y | S,X ( x | s, x ′ ) that depends on S , canbe determined by maximizing the data utility subject to linearequality constraints on data privacy. We compute the solution fortwo linear utility function: the expected distance and total vari-ance distance. It is shown that the non-Markov randomizationsignificantly improves data utility and the marginal probability P X ( x ) remains the same after the linear sanitization method: P Y ( x ) = P X ( x ) , ∀ x ∈ X . A full version of this paper is accessible at: https://arxiv.org/abs/2101.09689I. I
NTRODUCTION
The privacy-preserving problem can be described as fol-lows. A data curator wants to publish data X that is correlatedwith the sensitive attribute S . To protect privacy, it privatizes X by producing and releasing the sanitized data Y . Theproblem is how to design the sanitization scheme, or therandomization mechanism, to attain a certain level of dataprivacy. We consider two main metrics for measuring privacy:local differential privacy and log-lift. Local Differential Privacy : Let S and X be the randomvariables in finite alphabets S and X , respectively. Assumethat the sanitized data has the same alphabet as X , i.e., Y = X . Differential privacy (DP) [1] measures the statis-tical distinguishability of S . Any sanitization will result ina conditional probability of the output data P Y | S ( x | s ) . Thestatistical distances of the released data Y conditioned ontwo adjacent sensitive instances s, s ′ can be measured by L DP ( S → Y ) = max x,s,s ′ : s ∼ s ′ log P Y | S ( x | s ) P Y | S ( x | s ′ ) , where s ∼ s ′ denotes s and s ′ are neighbors that is defined by the Hammingdistance constraint d H ( s, s ′ ) ≤ . A sanitization mechanismis called ǫ -DP if it generates output Y such that L DP ( S → We use X to denote the alphabet of X and Y and x , x ′ or ˜ x to denotean instance of either X or Y . Y ) ≤ ǫ . A small value for ǫ implies indistinguishability of thesensitive data S when observing the released data Y . The localdifferential privacy (LDP) [2], [3] relaxes the neighborhoodconstraint in DP: L LDP ( S → Y ) = max x,s,s ′ log P Y | S ( x | s ) P Y | S ( x | s ′ ) . (1)This is a more general data privacy measure, and a strongernotion of privacy: an ǫ -LDP mechanism is always ǫ -DP, butnot vice versa. Log-lift : Consider the following statistical inference setting.An adversary wants to infer S from Y . The multiplicativedifference between the posterior belief P S | Y ( s | x ) and the priorbelief P S ( s ) denotes the knowledge gain on the sensitive data S by the adversary and therefore indicates the privacy of Y .For a guessing adversary, the mutual information I ( S ; Y ) = E [log P S | Y ( s | x ) P S ( s ) ] and log E [max s P S | Y ( s | x )]max s P S ( s ) are used to quantifythe average and maximal private information leakage in [4],[5] and [6]–[8], respectively. They correspond to two extremecases, α = 1 and α → ∞ , of the α -leakage proposed in [9]based on the Arimoto mutual information I Aα ( S ; Y ) . In fact,all these privacy measures can be guaranteed by the log-lift [10]: L LL ( S → Y ) = max x,s (cid:12)(cid:12)(cid:12)(cid:12) log P S | Y ( s | x ) P S ( s ) (cid:12)(cid:12)(cid:12)(cid:12) . (2)If L LL ( S → Y ) ≤ ǫ , I Aα ( S ; Y ) ≤ αα − ǫ for all α ≥ [10,Proposition 1].While most existing studies only adopt one data privacymeasure, we propose a linear sanitization scheme that attainsLDP and log-lift at the same time. We first reveal that bothLDP and log-lift are inverse proportional to the statisticaldistance between the conditional probability P Y | S ( x | s ) andthe marginal probability P Y ( x ) : the closer these two prob-abilities are, the more private Y is. Based on the fact that P Y ( x ) is the expected value of P X | S ( x | s ) , we request thatfor all s, x the conditional probability P Y | S ( x | s ) reduces P X | S ( x | s ) (in the original dataset) by α ( P X ( x | s ) − P X ( x )) LDP also applies to non-metric space S , when there is no distance functionfor the definition of neighborhood, e.g., categorial dataset. DP is studied mainly in computer science, where X = f ( S ) for somedeterministic functions f and the privatization usually refers to noise addingmechanism. LDP was originally proposed in [2] for multi-party privacy,where minimax techniques applies to derive fundamental limits on statisticalrisk assessment and information-theoretic measures. The mutual information,maximal leakage and log-lift are often used in information theory, where S and X are any correlated rvs and the sanitization usually refers to an encodingfunction. for α ∈ (0 , . This ensures a linear decrease | P Y | S ( x | s ) − P Y ( x ) | = (1 − α ) | P X | S ( x | s ) − P X ( x ) | , ∀ s, x , which indicatesa reduction of approximately a factor of (1 − α ) in bothLDP and log-lift, but remains the same marginal probability: P Y ( x ) = P X ( x ) , ∀ x ∈ X . We then determine the randomizedscheme that generates such Y . We show that the Markovsanitization scheme P Y | X ( x | x ′ ) = P Y | S,X ( x | s, x ′ ) can beobtained by solving linear equations. The optimal non-Markovsanitization scheme, the P Y | S,X ( x | s, x ′ ) that depends on S ,can be determined by maximizing the data utility subject tolinear equality constraints on data privacy. We compute theoptimal non-Markov sanitization scheme for two linear utilityfunction: the expected distance and total variance distance.The latter is a linear approximation of the mutual information I ( X ; Y ) .II. L INEAR R EDUCTION M ETHOD FOR D ATA P RIVACY
We rewrite the maximand in the log-lift (2) as | log P Y | S ( x | s ) P Y ( x ) | and the LDP in (1) as L LDP ( S → Y ) = max x,s,s ′ (cid:26) log P Y | S ( x | s ) P Y ( x ) + log P Y ( x ) P Y | S ( x | s ′ ) (cid:27) = max x,s,s ′ (cid:26) log P Y | S ( x | s ) P Y ( x ) − log P Y | S ( x | s ′ ) P Y ( x ) (cid:27) . Now, both LDP and log-lift are in terms of the conditionalprobability P Y | S ( x | s ) and the marginal probability P Y ( x ) , thestatistical distance between which is measured by log P Y | S ( x | s ) P Y ( x ) if P Y | S ( x | s ) ≥ P Y ( x ) and − log P Y | S ( x | s ) P Y ( x ) if P Y | S ( x | s )
We set the alphabet of the published data Y the same as X : Y = X . The method generates Y according to the conditionalprobability: P Y | S ( x | s ) = P X | S ( x | s ) − α ( P X | S ( x | s ) − P X ( x ))= (1 − α ) P X | S ( x | s ) + αP X ( x ) , (3) For the correlation in the original dataset, denoted by the joint probability P S,X ( s, x ) , we have LDP L LDP ( S → X ) and log-lift L LL ( S → X ) . Theymeasure the data privacy when X is released without any randomization. Thiscase attains perfect fidelity for the released data with the worst privacy. It is easy to verify that ≤ P Y | S ( x | s ) ≤ , ∀ s, x and P x ∈X P Y | S ( x | s ) = 1 , ∀ s , i.e., P Y | S ( x | s ) in (3) is a probability measure.Here, P Y | S ( x | s ) = P X | S ( x | s ) if α = 0 . We consider a strict reduction inLDP and log-lift in this paper and therefore set α > . . . . . . . . . . . . . α d a t a p r i v ac y : L D P a nd l og - li f t L LDP ( S → Y )(1 − α ) L LDP ( S → X ) L LL ( S → Y )(1 − α ) L LL ( S → X ) Fig. 1: For the dataset in Example 1, the reduction of theLDP L LDP ( S → Y ) and log-lift L LL ( S → Y ) as α increases,and their approximations (1 − α ) L LDP ( S → X ) and (1 − α ) L LL ( S → X ) , respectively, in (6).where α ∈ (0 , . Here, (3) is a line search method with − ( P X | S ( x | s ) − P X ( x )) being the descent direction of the ℓ distance | P X | S ( x | s ) − P X ( x ) | at P X ( x | s ) . This can also beinterpreted as a variation reduction method (see Appendix A).It is clear that as α increases, Y becomes more private. For α = 1 , Y is independent of S : P Y | S ( x | s ) = P X ( x ) for all s and x , where perfect privacy attains: L LDP ( S → Y ) = 0 and L LL ( S → Y ) = 0 .Eq. (3) results in a shift in joint probability P S,Y ( s, x ) = P Y | S ( x | s ) P S ( s ) = (1 − α ) P S,X ( s, x ) + αP S ( s ) P X ( x ) , butthe marginal probability of the released data Y remains thesame: P Y ( x ) = X s P S,Y ( s, x )= (1 − α ) P X ( x ) + αP X ( x ) = P X ( x ) , ∀ x. (4)That is, the statistics on the public data X does not changeafter randomization: the released data Y provides the correctanswer to any query on statistical aggregation of X .
1) Reduction in LDP and Log-lift:
Eq. (3) reduces the ℓ -distance by a factor of − α : for each x , | P Y | S ( x | s ) − P Y ( x ) | = (1 − α ) | P X | S ( x | s ) − P X ( x ) | , ∀ s and (cid:12)(cid:12)(cid:12) P Y | S ( x | s ) − P Y | S ( x | s ′ ) P Y ( x ) (cid:12)(cid:12)(cid:12) =(1 − α ) (cid:12)(cid:12)(cid:12) P X | S ( x | s ) − P X | S ( x | s ′ ) P X ( x ) (cid:12)(cid:12)(cid:12) , ∀ s, s ′ . (5)This can be translated to a linear reduction in LDP and log-liftby the first order Taylor approximation log(1 + x ) ≈ x : L LDP ( S → Y ) ≈ (1 − α ) L LDP ( S → X ) , (6a) L LL ( S → Y ) ≈ (1 − α ) L LL ( S → X ) . (6b)See Fig. 1. The approximations in (6) are good when (cid:12)(cid:12) P X | S ( x | s ) P X ( x ) − (cid:12)(cid:12) ≤ , ∀ s, x .III. O PTIMAL P RIVACY - PRESERVING S CHEME
As explained in Section II-A, one can choose an α ∈ (0 , in (3) to denote a specific privacy level, which results in See Appendix D for the derivation of the approximations in (6). approximately a reduction of factor − α in both LDP andLog-lift (6). The remaining problem is how to determine arandomized mechanism P Y | X ( x | x ′ ) , which generates Y thatholds the private transition probability (3). If such mechanismis not unique, we should choose the one that optimizesthe data utility. Denote U ( X ; Y ) the utility function thatmeasures the usefulness of the released data Y . We considertwo types of linear U ( X ; Y ) in this paper: the expecteddistortion E [ d ( X, Y )] , where d ( X = x ′ , Y = x ) ≥ and d ( X = x ′ , Y = x ) = 0 for x = x ′ ; the total variance dis-tance D TV ( X, Y ) = 1 − P x P X ( x ) P Y | X ( x | x ) that measuresthe expected ℓ distance between a randomization scheme P Y | X ( x | x ′ ) and the optimal P ∗ Y | X ( x | x ′ ) that maximizes themutual information I ( X ; Y ) . Here, D TV ( X, Y ) can be consid-ered as a linear approximation of I ( X ; Y ) . See Appendix B.The randomized mechanism P Y | X ( x | x ′ ) can be designedin two ways. For S being a nesting private attribute of X ,e.g. S = f ( X ) for some randomized function f as assumedin [6], [9], the randomization is conditioned only on theobservable data X . In this case, Markov chain S − X − Y forms and the randomized mechanism refers to the Markovtransition probability P Y | X ( x | x ′ ) = P Y | S,X ( x | s, x ′ ) , ∀ s , e.g.,as in [11], [12]. If both S and X are observable, e.g., theydenote attribute columns in tabular dataset, we can search theoptimal randomization over all non-Markov transition prob-abilities P Y | S,X ( x | s, x ′ ) [9, Fig. 1(b)], where P Y | X ( x | x ′ ) = P Y | S,X ( x | s, x ′ ) does not necessarily hold for all s . A. Markov Transition Probability
The lemma below characterizes the Markov randomizationsolution.
Lemma 1.
The Markov transition probability that satisfies theequality (3) is P Y | X ( x | x ′ ) = ( − α (1 − P X ( x )) x ′ = xαP X ( x ) x ′ = x . (7) Proof:
Lemma holds because P Y | S ( x | s ) = X x ′ P Y | X ( x | x ′ ) P X | S ( x ′ | s ) =(1 − α (1 − P X ( x ))) P X | S ( x | s ) + αP X ( x ) X x ′ P X | S ( x ′ | s ) =(1 − α ) P X | S ( x | s ) + αP X ( x ) , ∀ s, x. (8)The full proof is presented in Appendix E by solving linearequations.The Markov transition probability in Lemma 1incurs the expected distortion E [ d ( X, Y )] = α P x,x ′ : x ′ = x P X ( x ) P X ( x ′ ) d ( X = x ′ , Y = x ) and thetotal variance distance D TV ( X, Y ) = α (1 − P x P X ( x )) . B. Non-Markov Transition Probability
The non-Markov transition probability P Y | S,X ( x | s, x ′ ) de-termines a randomized mechanism P Y | X ( x | x ′ ) = X s P Y | S,X ( x | s, x ′ ) P S | X ( s | x ′ ) , (9) which is linear in P Y | S,X ( x | s, x ′ ) . Consider all P Y | S,X ( x | s, x ′ ) that satisfy (3). They are the transitionprobabilities that attains the same level of privacy (specified by α ). The problem of searching for an optimal P ∗ Y | S,X ( x | s, x ′ ) that maximize the data utility can be formulated as follows.For α ∈ (0 , , max P Y | S,X ( x | s,x ′ ) U ( X ; Y ) (10a)s.t. X x ′ P Y | S,X ( x | s, x ′ ) P X | S ( x ′ | s ) = P X | S ( x | s ) − α ( P X | S ( x | s ) − P X ( x )) , ∀ s, x. (10b)It is clear in (9) that the Markov solution is a special case ofthe non-Makov transition probability. Therefore, the minimizer P ∗ Y | S,X ( x | s, x ) of (10) attains a data utility no worse than theMarkov solution in Lemma 1 in general. See Example 1. Sincethe constraints in (10b) are linear, problem (10) is concavemaximization if U ( X ; Y ) is concave in P Y | X ( x | x ′ ) . Forlinear function U ( X ; Y ) , (10a) can be formulated by linearprogramming (LP).Below, we compute the solutions for the utility functions U ( X ; Y ) = − D TV ( X ; Y ) and U ( X ; Y ) = − E [ d ( X, Y )] . Theproofs of Propositions 1 and 2 are in Appendix C. Proposition 1.
For U ( X ; Y ) = − D TV ( X ; Y ) , the solutionto problem (10) is any transition probability P ∗ Y | S,X ( x | s, x ′ ) satisfying the followings for each s : P ∗ Y | S,X ( x | s, x ) = min (cid:26) − α (cid:16) − P X ( x ) P X | S ( x | s ) (cid:17) , (cid:27) , (11a) P ∗ Y | S,X ( x | s, x ′ ) = 0 , ∀ x ∈ X − ( s ) , x ′ ∈ X : x ′ = x, (11b) P ∗ Y | S,X ( x | s, x ′ ) = 0 , ∀ x ′ ∈ X + ( s ) , x ∈ X : x = x ′ , (11c) X x ′ ∈X − ( s ) P ∗ Y | S,X ( x | s, x ′ ) P X | S ( x ′ | s )= − α (cid:0) P X | S ( x | s ) − P X ( x ) (cid:1) , ∀ x ∈ X + ( s ) , (11d) X x ∈X + ( s ) P ∗ Y | S,X ( x | s, x ′ )= α (cid:0) − P X ( x ′ ) P X | S ( x ′ | s ) (cid:1) , x ′ ∈ X − ( s ) , (11e) where X + ( s ) = { x ∈ X : P X ( x ) ≥ P X ( x | s ) } and X − ( s ) = { x ∈ X : P X ( x ) < P X | S ( x | s ) } . We can directly determine the optimal P ∗ Y | S,X ( x | s, x ′ ) byProposition 1: for each s , do the assignments in (11a)-(11c);determine P Y | S,X ( x | s, x ′ ) for all x ′ ∈ X − ( s ) and x ∈ X + ( s ) by solving linear equations formed by (11d) and (11e). Here,(11a) in fact saturates the diagonal entry P ∗ Y | S,X ( x | s, x ) foreach x ∈ X in the constrained set and the solution to thelinear equations (11d) and (11e) is not unique. The concavity does not hold for general U ( X ; Y ) . For example, themutual information I ( X ; Y ) is convex in P Y | S,X ( x | s, x ′ ) . This can be seen from the proof of Proposition 1: the diagonal entry P ∗ Y | S,X ( x | s, x ) cannot be increased any further without breaching theconstraint (10b). This is because the dimension of the null space formed by (11d) and (11e)is no less than 1. See the explanation in Appendix F. . . . . . privacy: L LDP ( S → Y ) u tilit y l o ss : D T V ( X ; Y ) Markov solution in Lemma 1non-Markov solution in Proposition 1 . . . . . . privacy: L LDP ( S → Y ) u tilit y l o ss : H ( X ) − I ( X ; Y ) Markov solution in Lemma 1non-Markov solution in Proposition 1
Fig. 2: The privacy-utility tradeoff obtained from the dataset in Example 1 by enumerating α ∈ (0 , : for each value of α , theconditional probability P Y | S ( x | s ) in (3) is determined, where we get the privacy measure L LDP ( S → Y ) and obtain the Markovrandomization schemes in Lemma 1 and Proposition 1, respectively. We plot the resulting total variance distance D TV ( X ; Y ) and the utility loss in terms of mutual information H ( X ) − I ( X ; Y ) vs. L LDP ( S → Y ) . The non-Markov solution outperformsMarkov solution.Proposition below shows that when the expected distanceis used for utility measure, the problem can be reduced toan LP with reduced dimension of decision variables that isconstrained by (11d) and (11e). Proposition 2.
For U ( X ; Y ) = − E [ d ( X, Y )] , the solution toproblem (10) is the transition probability P ∗ Y | X ( x | x, s ) thatholds (11) . The minimizer of min X x ∈X + ( s ) ,x ′ ∈X − ( s ) P Y | S,X ( x | s, x ′ ) P S,X ( s, x ′ )) d ( x ′ , x ) s.t. (11d) and (11e) . (12) determines P ∗ Y | S,X ( x | s, x ′ ) for all x ∈ X + ( s ) and x ′ ∈X − ( s ) for each s . In problem (10), the data privacy constraint (10b) isstrengthen by increasing α , while the maximal utility de-creases. Therefore, the privacy utility tradeoff (PUT) can beobtained by varying α ∈ (0 , . Example 1.
Consider an database with the joint probability P X | S ( x | s ) below. The marginal probabilities are P S (1) = 0 . , X = a X = b X = c X = dS = 1 S = 2 P S (2) = 0 . , P X ( a ) = 0 . , P X ( b ) = 0 . , P X ( c ) =0 . and P X ( d ) = 0 . . For α = 0 . , we show how toobtain the optimal transition probability P ∗ Y | S,X ( x | s, x ′ ) inProposition 1. For S = 1 , X + (1) = { a, b } and X − (1) = { c, d } . By (11a) , we set P ∗ Y | S,X ( a | , a ) = P ∗ Y | S,X ( b | , b ) =1 , P ∗ Y | S,X ( c | , c ) = 0 . and P ∗ Y | S,X ( d | , d ) = 0 . .We obtain one solution to the linear equations (11d) and (11e) : P ∗ Y | S,X ( a | , c ) = 0 . , P ∗ Y | S,X ( b | , c ) = 0 . , P ∗ Y | S,X ( a | , d ) = 0 and P ∗ Y | S,X ( b | , d ) = 0 . . All otherentries of P ∗ Y | S,X ( x | , x ′ ) are set to . The transition proba-bility P ∗ Y | S,X ( x | , x ′ ) for all x, x ′ can be determined in thesame way. Apply P ∗ Y | X ( x | x ′ ) = P ∗ Y | S,X ( x | , x ′ ) P S | X (1 | x ′ ) + P ∗ Y | S,X ( x | , x ′ ) P S | X (2 | x ′ ) by (9) . The resulting mutual in-formation is I ( X ; Y ) = 1 . and D TV ( X ; Y ) = 0 . . They . . . . . . . . . . . . privacy: L LL ( S → Y ) u tilit y l o ss : E [ d ( X , Y ) ] Markov solution in Lemma 1non-Markov solution in Proposition 2
Fig. 3: The privacy-utility tradeoff in terms of the expecteddistance E [ d ( X ; Y )] vs. the log-lift L LL ( S → Y ) obtainedfrom the dataset in Example 1. can be compared to the Markov solution in Lemma 1, wherewe get I ( X ; Y ) = 0 . and D TV ( X ; Y ) = 0 . . This meansthat when a certain level of data privacy is guaranteed,adopting non-Markov randomization can significantly improvethe utility. This can also be seen in Fig. 2 and 3.We then obtain the solution in Proposition 2 to the problem (10) for U ( X ; Y ) = − E [ d ( X, Y )] . The procedure is the sameas above, except that P ∗ Y | S,X ( x | s, x ′ ) for all x ∈ { a, b } and x ′ ∈ { c, d } is determined by solving the minimization (12) forall s . The resulting privacy-utility tradeoff is shown in Fig. 3. IV. C
ONCLUSION
Noting that P X ( x ) is the expected value of the conditionalprobability P X | S ( x | s ) w.r.t. the marginal probability of S ,we proposed a privacy-preserving method that generates thesanitized data Y with P Y | S ( x | s ) that linearly reduces thevariance of P X | S ( x | s ) in the original data X . This random-ization method maintains the marginal probability P Y ( x ) = P X ( x ) , ∀ x . We showed that L LDP ( S → Y ) and L LL ( S → Y ) can be expressed in terms of | log P Y | S ( x | s ) P Y ( x ) | and therefore theproposed method reduces both LDP and log-lift. Specifically, L LDP ( S → Y ) ≈ (1 − α ) L LDP ( S → X ) and L LL ( S → Y ) ≈ (1 − α ) L LL ( S → X ) , where α ∈ (0 , can be considered as the privacy level. We considered Markov and non-Markovsanitization schemes to generate Y . While the Markov schemewas obtained by solving linear equations, we formulated anLP to compute the optimal non-Markov scheme for two linearutility functions. The experimental results showed that the non-Markov scheme significantly improves data utility.There are two aspects that can be further explored. Whilethe proposed linear method reduces the variance of P X | S ( x | s ) for each instances x , it suffices to apply (3) to only s ∈ arg max s P Y | S ( x | s ) ∪ arg min s P Y | S ( x | s ) for each x . Thiswill also result in a reduction (1 − α ) of LDP and log-lift, butthe design of the randomization scheme and the improvementin data utility need to be studied. In [13], the local informationgeometry technique is used to approximate the data utility.This paper suggests that it can also be applied to the dataprivacy. In local proximity (cid:12)(cid:12) P X | S ( x | s ) P X ( x ) − (cid:12)(cid:12) ≤ , ∀ s, x , theapproximation of LDP in (6a) can be replaced by the linearequality (5). This treatment is similar to [14] where theapproximation is based on 2nd order Taylor expansion. Thelinear algebra techniques in [14] are worth investigating indata privacy. A PPENDIX AI NTERPRETATION OF C ONTROL V ARIATE M ETHOD
The linear reduction method in (3) coincides with thecontrol variate method originally proposed for finding anunbiased estimator in [15]. It generates new random variable P Y | S ( x | s ) with the same sample space size as P X | S ( x | s ) ,but a strictly smaller variance: for each x , Var [ P Y | S ( x | s )] = E S [( P Y | S ( x | s ) − P X ( x )) ] = (1 − α ) Var [ P X | S ( x | s )] < Var [ P X | S ( x | s )] , ∀ α ∈ (0 , . A PPENDIX BT OTAL V ARIANCE DISTANCE AS U TILITY L OSS
For Y such that |Y| = |X | , the following transitionprobability maximizes the mutual information I ( X ; Y ) P ∗ Y | X ( x | x ′ ) = ( x = x ′ x = x ′ (13)Consider the total variance distance D TV ( X, Y ) = P x P x ′ P X ( x ′ ) (cid:12)(cid:12) P Y | X ( x | x ) − P ∗ Y | X ( x | x ) (cid:12)(cid:12) = 1 − P x P X ( x ) P Y | X ( x | x ) . It can be seen from Fig. 4that D TV ( X, Y ) is almost order reversing, i.e., if I ( X ; Y ) ≥ I ( X ; Y ′ ) , then D TV ( X ; Y ) ≤ D TV ( X ; Y ) .Therefore, for I ( X ; Y ) being a utility measure, D TV ( X ; Y ) denotes the utility loss. A PPENDIX CP ROOF OF P ROPOSITIONS AND U ( X ; Y ) = − D TV ( X ; Y ) , the problem (10) is equiva-lent to max P s,x P Y | S,X ( x | s, x ) P S,X ( s, x ) subject to (10b). Total variance distance is the f -divergence D f ( p k q ) = P x q ( x ) f ( p ( x ) q ( x ) ) for f ( t ) = | t − | . The total variance distanceD TV ( X, Y ) is between any P X,Y ( x ′ , x ) = P Y | X ( x | x ′ ) P X ( x ′ ) and theoptimizer P ∗ X,Y ( x ′ , x ) = P ∗ Y | X ( x | x ′ ) P X ( x ′ ) . . . . . . . . . . . . . α u tilit y l o ss H ( X ) − I ( X ; Y ) for MarkovD TV ( X ; Y ) for Markov H ( X ) − I ( X ; Y ) for Non-MarkovD TV ( X ; Y ) for Non-Markov Fig. 4: The utility losses H ( X ) − I ( X ; Y ) and D TV ( X ; Y ) areincreasing with α . The mutual information I ( X ; Y ) and totalvariance distance D TV ( X ; Y ) are determined by the Markovsolution (17) and non-Markov solution (11) for the dataset inExample 1.This LP is separable in s [16], the maximizer of which canbe determined by solving max X x P Y | S,X ( x | s, x ) P S,X ( s, x ) , s.t. (10b) , ∀ x (14)for each s . We show below this problem is also separable in x . For each s , rewrite (10b) as P Y | S,X ( x | s, x ) = 1 − α (cid:16) − P X ( x ) P X | S ( x | s ) (cid:17) − X x ′ : x ′ = x P Y | S,X ( x | s, x ′ ) P X | S ( x ′ | s ) P X | S ( x | s ) , ∀ x. (15)Since P Y | S,X ( x | s, x ) ≤ , we rewrite (15) as the in-equality P x ′ : x ′ = x P Y | S,X ( x | s, x ′ ) P X | S ( x ′ | s ) P X | S ( x | s ) ≥ − α (cid:0) − P X ( x ) P X | S ( x | s ) (cid:1) , ∀ x ; On the other hand, because P Y | S,X ( x | s, x ′ ) ≥ , ∀ x ′ = x , we have P x ′ : x ′ = x p Y | S,X ( x | s, x ′ ) P X | S ( x ′ | s ) P X | S ( x | s ) ≥ max n − α (cid:0) − P X ( x ) P X | S ( x | s ) (cid:1) , o , ∀ x. Apply this inequality to(15) to convert the constraint (10b) , ∀ s in (14) to P Y | S,X ( x | s, x ) ≤ min n − α (cid:0) − P X ( x ) P X | S ( x | s ) (cid:1) , o , ∀ x. (16)Then, problem (14) is decomposable in x . For each s and x , the solution to max P Y | S,X ( x | s, x ) P S,X ( s, x ) , s.t. (16) is P ∗ Y | S,X ( x | x, s ) = min (cid:8) − α (cid:0) − P X ( x ) P X | S ( x | s ) (cid:1) , (cid:9) , where, byconstraint (10b) and P x P Y | S,X ( x | s, x ′ ) = 1 , we have (11b)and (11c), respectively. From (10b), (11a) and (11b), we have(11d); For (11c) and P x ∈X P ∗ Y | S,X ( x | s, x ′ ) = 1 , we have(11e).For U ( X ; Y ) = − E [ d ( X, Y )] , problem (10)is also separable in s . From (16), we havethe constraint P x ′ : x ′ = x P Y | S,X ( x | s, x ′ ) ≤ max (cid:8) , − α (cid:0) − P X ( x ) P X | S ( x | s ) (cid:1)(cid:9) , where the objectivefunction P x,x ′ : x ′ = x P Y | S,X ( x | s, x ′ ) P S,X ( s, x ′ )) d ( x ′ , x ) is minimized when P x ′ : x ′ = x P Y | S,X ( x | s, x ′ ) =max (cid:8) , − α (cid:0) − P X ( x ) P X ( x | s ) (cid:1)(cid:9) by the optimizer in (11),where the value of P Y | S,X ( x | s, x ′ ) for all x ∈ X + ( s ) and x ′ ∈ X − ( s ) is determined by (12). R EFERENCES[1] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise tosensitivity in private data analysis,” in
Theory of Cryptography . Berlin,Heidelberg: Springer, 2006, pp. 265–284.[2] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy andstatistical minimax rates,” in
Proc. IEEE Annu. Symp. Found. Comput.Sci. , 2013, pp. 429–438.[3] A. D. Sarwate and L. Sankar, “A rate-disortion perspective on localdifferential privacy,” in
Proc. Annu. Allerton Conf. Commun., Control,and Comput. , 2014, pp. 903–908.[4] F. du Pin Calmon and N. Fawaz, “Privacy against statistical inference,” in
Proc. Annu. Allerton Conf. Commun., Control, and Comput. , Monticello,IL, 2012, pp. 1401–1408.[5] S. Salamatian, A. Zhang, F. du Pin Calmon, S. Bhamidipati, N. Fawaz,B. Kveton, P. Oliveira, and N. Taft, “Managing your private and publicdata: Bringing down inference attacks against your privacy,”
IEEE J. SelTop. Signal Process. , vol. 9, no. 7, pp. 1240–1255, 2015.[6] I. Issa, S. Kamath, and A. B. Wagner, “An operational measure ofinformation leakage,” in , Princeton, NJ,2016, pp. 234–239.[7] Y. Liu, N. Ding, P. Sadeghi, and T. Rakotoarivelo, “Privacy-utilitytradeoff in a guessing framework inspired by index coding,” in
Proc.IEEE Int. Symp. Inf. Theory , Los Angeles, CA, 2020, pp. 926–931.[8] J. Liao, L. Sankar, F. P. Calmon, and V. Y. F. Tan, “Hypothesis testingunder maximal leakage privacy constraints,” in
Proc. IEEE Int. Symp.Inf. Theory , Aachen, Germany, 2017, pp. 779–783.[9] J. Liao, O. Kosut, L. Sankar, and F. d. P. Calmon, “Tunable measures forinformation leakage and applications to privacy-utility tradeoffs,”
IEEETrans. Inf. Theory , vol. 65, no. 12, pp. 8043–8066, 2019.[10] H. Hsu, S. Asoodeh, and F. P. Calmon, “Information-theoretic privacywatchdogs,” in
Proc. IEEE Int. Symp. Inf. Theory , Paris, France, 2019,pp. 552–556.[11] N. Ding and P. Sadeghi, “A submodularity-based clustering algorithmfor the information bottleneck and privacy funnel,” in
Proc. IEEE Inf.Theory Workshop , Visby, Sweden, 2019, pp. 1–5.[12] A. Makhdoumi, S. Salamatian, N. Fawaz, and M. Médard, “From theinformation bottleneck to the privacy funnel,” in
Proc. IEEE Inf. TheoryWorkshop , Hobart, TAS, 2014, pp. 501–505.[13] B. Razeghi, F. Calmon, D. Gunduz, S. Voloshynovskiy et al. , “On per-fect obfuscation: Local information geometry analysis,” arXiv preprintarXiv:2009.04157 , 2020.[14] S. Huang, A. Makur, L. Zheng, and G. W. Wornell, “An information-theoretic approach to universal feature selection in high-dimensionalinference,” in
Proc. IEEE Int. Symp. Inf. Theory , Aachen, Germany,2017, pp. 1336–1340.[15] C. Lemieux,
Control Variates . American CancerSociety, 2017, pp. 1–8. [Online]. Available:https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118445112.stat07947[16] S. Boyd, S. P. Boyd, and L. Vandenberghe,
Convex optimization .Cambridge university press, 2004. A PPENDIX DA PPROXIMATION IN (6)Using log(1 + x ) ≈ x , we have the local differential privacy L LDP ( S → X ) = max x,s,s ′ log P X | S ( x | s ) P X | S ( x | s ′ )= max x,s,s ′ (cid:26) log P X | S ( x | s ) P X ( x ) − log P X | S ( x | s ′ ) P X ( x ) (cid:27) ≈ max x,s,s ′ P X | S ( x | s ) − P X | S ( x | s ′ ) P X ( x ) . Similarly, L LDP ( S → Y ) ≈ max x,s,s ′ P Y | S ( x | s ) − P Y | S ( x | s ′ ) P Y ( x )= max x,s,s ′ (1 − α ) (cid:0) P X | S ( x | s ) − P X | S ( x | s ′ ) (cid:1) P X ( x )= (1 − α ) max x,s,s ′ P X | S ( x | s ) − P X | S ( x | s ′ ) P X ( x ) ≈ L LDP ( S → X ) . For the log-lift, we have L LL ( S → X ) = max x,s (cid:12)(cid:12)(cid:12)(cid:12) log P X | S ( x | s ) P X ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≈ max x,s (cid:12)(cid:12)(cid:12)(cid:12) P X | S ( x | s ) P X ( x ) − (cid:12)(cid:12)(cid:12)(cid:12) so that L LL ( S → Y ) = max x,s (cid:12)(cid:12)(cid:12)(cid:12) log (1 − α ) P X | S ( x | s ) + αP X ( x ) P Y ( x ) (cid:12)(cid:12)(cid:12)(cid:12) = max x,s (cid:12)(cid:12)(cid:12)(cid:12) log (1 − α ) P X | S ( x | s ) P X ( x ) + α (cid:12)(cid:12)(cid:12)(cid:12) ≈ max x,s (cid:12)(cid:12)(cid:12)(cid:12) (1 − α ) P X | S ( x | s ) P X ( x ) + α − (cid:12)(cid:12)(cid:12)(cid:12) = (1 − α ) max x,s (cid:12)(cid:12)(cid:12)(cid:12) P X | S ( x | s ) P X ( x ) − (cid:12)(cid:12)(cid:12)(cid:12) ≈ (1 − α ) L LL ( S → X ) . A PPENDIX EP ROOF OF L EMMA P Y | X ( x | x ′ ) = P Y | S,X ( x | s, x ′ ) for all s , we have P Y | S ( x | s ) = P x P Y | X ( x | x ′ ) P X ( x ′ | s ) for all s and x . Let P Y | X denote a |X | × |Y| transition probability matrix: P Y | X = [ P Y = x | X P Y = x | X . . . ] where P Y = x | X is acolumn vector such that P Y = x | X = [ P Y | X ( x | x ′ ) : x ′ ∈ X ] ⊺ . Let P X | S be a |S| × |X | matrix such that P X | S =[ P X = x | S P X = x | X . . . ] , where P X = x | S = [ P X | S ( x | s ) : x ∈X ] ⊺ . We define P Y | S and P Y = x | S in the same way. Rewrite(3) in column vector form: P Y = x | S = P X | S P Y = x | X = (1 − α ) P X = x | S + αP X ( x ) , (17)where = [1 , . . . , ⊺ denotes an all one column vector. For |S| ≥ |X | , let A be the left inverse matrix of P X | S . Then, P Y = x | X = A P Y = x | S = (1 − α ) A P X = x | S + αP X ( x ) A . (18)Let a x,s be the x th row and s th column entry of A and A x =[ a x,s a x,s . . . ] be the row vector of A . Denote I m the identitymatrix of dimension m . From AP X | S = I |X | , we have A x P X = x ′ | S = X s a x,s P X | S ( x ′ | s ) = ( x ′ = x x ′ = x . The underlying assumption here is that P X | S is full rank. However, (8)ensures that the transition probability in Lemma 1 is the Markov solution toany P X | S . From AP X | S = , we have A x P X | S = P s ∈S a x,s (cid:0) P x ′ ∈X p ( x ′ | s ) (cid:1) = A x = 1 , ∀ x. That is A = .We rewrite (18) as P Y = x | X = (1 − α ) e x + αP X ( x ) , ∀ x, where e x is a unit vector such that the x -dim is 1 and allothers are zero. It is shown in the proof of Lemma 1 that thisis in fact the Markov solution for any P X | S .A PPENDIX FR ANK D EFICIENCY OF (11d)
AND (11e)From (11e), we have P ∗ Y | S,X ( x | s, x ′ ) = − X ˜ x ∈X + ( s ): ˜ x = x P ∗ Y | S,X (˜ x | s, x ′ ) + α (cid:0) − P X ( x ′ ) P X | S ( x ′ | s ) (cid:1) for each x ∈ X + ( s ) and x ′ ∈ X − ( s ) . Substituting to (11d),we get X x ′ ∈X − ( s ) − X ˜ x ∈X + ( s ): ˜ x = x P ∗ Y | S,X (˜ x | s, x ′ ) P X | S ( x ′ | s )+ α (cid:16) P X | S ( x ′ | s ) − P X ( x ′ ) (cid:17)! = − α (cid:16) P X | S ( x | s ) − P X ( x ) (cid:17) for each x ∈ X + ( s ) . Reorganize this equality as X x ′ ∈X − ( s ) X ˜ x ∈X + ( s ): ˜ x = x P ∗ Y | S,X (˜ x | s, x ′ ) P X | S ( x ′ | s )= α " X x ′ ∈X − ( s ) (cid:16) P X | S ( x ′ | s ) − P X ( x ′ ) (cid:17) + (cid:16) P X | S ( x | s ) − P X ( x ) (cid:17) = α " X x ′ ∈X − ( s ) (cid:16) P X | S ( x ′ | s ) − P X ( x ′ ) (cid:17) + X x ′′ ∈X + ( s ) (cid:16) P X | S ( x ′′ | s ) − P X ( x ′′ ) (cid:17) − X ˜ x ∈X + ( s ): ˜ x = x (cid:16) P X | S (˜ x | s ) − P X (˜ x ) (cid:17) = − α X ˜ x ∈X + ( s ): ˜ x = x (cid:16) P X | S (˜ x | s ) − P X (˜ x ) (cid:17) . (19)This is exactly the resulting equality by summing both sidesof (11d) over all ˜ x ∈ X + ( s ) such that ˜ xx
AND (11e)From (11e), we have P ∗ Y | S,X ( x | s, x ′ ) = − X ˜ x ∈X + ( s ): ˜ x = x P ∗ Y | S,X (˜ x | s, x ′ ) + α (cid:0) − P X ( x ′ ) P X | S ( x ′ | s ) (cid:1) for each x ∈ X + ( s ) and x ′ ∈ X − ( s ) . Substituting to (11d),we get X x ′ ∈X − ( s ) − X ˜ x ∈X + ( s ): ˜ x = x P ∗ Y | S,X (˜ x | s, x ′ ) P X | S ( x ′ | s )+ α (cid:16) P X | S ( x ′ | s ) − P X ( x ′ ) (cid:17)! = − α (cid:16) P X | S ( x | s ) − P X ( x ) (cid:17) for each x ∈ X + ( s ) . Reorganize this equality as X x ′ ∈X − ( s ) X ˜ x ∈X + ( s ): ˜ x = x P ∗ Y | S,X (˜ x | s, x ′ ) P X | S ( x ′ | s )= α " X x ′ ∈X − ( s ) (cid:16) P X | S ( x ′ | s ) − P X ( x ′ ) (cid:17) + (cid:16) P X | S ( x | s ) − P X ( x ) (cid:17) = α " X x ′ ∈X − ( s ) (cid:16) P X | S ( x ′ | s ) − P X ( x ′ ) (cid:17) + X x ′′ ∈X + ( s ) (cid:16) P X | S ( x ′′ | s ) − P X ( x ′′ ) (cid:17) − X ˜ x ∈X + ( s ): ˜ x = x (cid:16) P X | S (˜ x | s ) − P X (˜ x ) (cid:17) = − α X ˜ x ∈X + ( s ): ˜ x = x (cid:16) P X | S (˜ x | s ) − P X (˜ x ) (cid:17) . (19)This is exactly the resulting equality by summing both sidesof (11d) over all ˜ x ∈ X + ( s ) such that ˜ xx = xx