A Privacy-Preserving Distributed Control of Optimal Power Flow
aa r X i v : . [ m a t h . O C ] F e b A Privacy-Preserving Distributed Control ofOptimal Power Flow
Minseok Ryu and Kibaek Kim
Abstract —We consider a distributed optimal power flowformulated as an optimization problem that maximizes anondifferentiable concave function. Solving such a problemby the existing distributed algorithms can lead to dataprivacy issues because the solution information exchangedwithin the algorithms can be utilized by an adversary toinfer the data. To preserve data privacy, in this paper wepropose a differentially private projected subgradient (DP-PS) algorithm that includes a solution encryption step. Weshow that a sequence generated by DP-PS converges inexpectation, in probability, and with probability . More-over, we show that the rate of convergence in expectationis affected by a target privacy level of DP-PS chosen by theuser. We conduct numerical experiments that demonstratethe convergence and data privacy preservation of DP-PS. Index Terms —Differential privacy, projected subgradientalgorithm, optimal power flow, dual decomposition.
I. I
NTRODUCTION
Optimal power flow (OPF) is an important problemin reliably and economically operating electric grids.Currently, the problem is solved by independent systemoperators in a centralized manner. Recently, however,distributed OPF has been spotlighted as a result of theintroduction of microgrids with energy storage [1] andincreasing penetrations of distributed energy resources[2]. Distributed OPF consists of (i) a set of OPF sub-problems defined for each zone of the power grid and(ii) consensus constraints that link the subproblems.Distributed OPF can be solved by the existing distributedalgorithms (e.g., [3], [4], [5], [6]), which do not requiresharing private data information (e.g., demand data fromeach zone) but send the local solutions to the centralmachine. Unfortunately, an adversary may be able toestimate the data based on the solutions (e.g., reverseengineering [7]), thus motivating the need for solutionencryption.Differential privacy (DP) is a randomization tech-nique that guarantees the existence of multiple datasetswith similar probabilities of resulting in the encryptedsolution, thus preserving data privacy [8]. A differen-tially private algorithm is an algorithm that incorporates
M. Ryu and K. Kim are with the Mathematics and Computer ScienceDivision, Argonne National Laboratory, Lemont, IL, USA (Contact:[email protected]). This material is based upon work supported by theU.S. Department of Energy, Office of Science, Advanced ScientificComputing Research, under Contract DE-AC02-06CH11357. differential privacy for preserving data privacy duringthe algorithmic process [9]. Several DP algorithms havebeen proposed to solve various distributed optimizationproblems. For example, (i) a DP alternating directionmethod of multipliers (ADMM) was proposed for solv-ing a distributed empirical risk minimization problem[10], [11], [12] and a distributed DC OPF [13], and(ii) a DP stochastic gradient descent (SGD) methodwas proposed for solving a classification problem [14],a resource allocation problem [15], and deep neuralnetworks [16].We define target privacy level (TPL) as a user param-eter for DP algorithms to control data privacy. Whileguaranteeing stronger privacy, increasing TPL of the DPalgorithms may affect the convergence. For example,DP-ADMM with higher TPL is shown to find suboptimalsolutions, implying the need for a trade-off between TPLand solution quality [11], [13]. On the other hand, thenumerical results in [14] show that the solution accuracyfrom DP-SGD may be close to that of non-private SGD.Also, Huang et al. [12] report numerical experimentsthat DP-SGD has good noise resilience compared withthat of DP-ADMM, but it converges slowly. Whilenumerical evidence has been demonstrated for the trade-off between the convergence of DP algorithms and TPL,only a few studies (e.g., DP-ADMM in [12]) develop thetheoretical links.In this paper we present a DP projected subgradient(PS) algorithm for solving a distributed OPF whilepreserving data privacy, and we study how TPL affectsthe convergence of DP-PS theoretically and numerically.We first formulate the distributed second-order conic(SOC) and alternating current (AC) OPF (see, e.g., [17],[18], [19]) based on dual decomposition by taking theLagrangian relaxation with respect to the consensus con-straints, where supergradients are computed by solvingthe OPF subproblems in parallel. Moreover, in orderto guarantee data privacy, the supergradients exchangedwithin the algorithm are systematically randomized byadding random noise extracted from a Laplace distri-bution. Under three rules of specifying search directionand step size, we show that a sequence generated byDP-PS converges in expectation, in probability, and withprobability . In particular, we show that the convergence complexity is affected by a constant factor only as TPLincreases.Our contributions are summarized as follows: • Application of differential privacy to distributedcontrol of SOC and AC OPF models • Development of the convergence results of DP-PS • Numerical experiments to support our findings onthe convergence and privacy preservation of DP-PSThe remainder of the paper is organized as follows. InSection II, we present a distributed OPF problem. InSection III, we describe a differentially private controlwith the proposed DP-PS. In Section IV, we studythe convergence of DP-PS. We conduct case studies inSection V and summarize our conclusions in Section VI.We denote by N a set of natural numbers. For A ∈ N ,we define [ A ] := { , . . . , A } . We use h· , ·i and || · || todenote the scalar product and the Euclidean norm.II. D ISTRIBUTED O PTIMAL P OWER F LOW
We depict a power network by a graph ( N , L ) , where N is a set of buses and L is a set of lines. For everyline ℓ ij ∈ L , where i is a from bus and j is a to bus ofline ℓ , we are given line parameters, including bounds [ θ ij , θ ij ] on voltage angle difference, thermal limit s ℓ ,resistance r ℓ , reactance x ℓ , impedance z ℓ := r ℓ + i x ℓ ,line charging susceptance b c ℓ , tap ratio τ ℓ , phase shiftangle θ s ℓ , and admittance matrix Y ℓ , namely, Y ℓ := (cid:20) Y ff ℓ Y ft ℓ Y tf ℓ Y tt ℓ (cid:21) = ( z − ℓ + i b c ℓ ) τ ℓ − z − ℓ τ ℓ e − i θ s ℓ − z − ℓ τ ℓ e i θ s ℓ z − ℓ + i b c ℓ , G cf ℓ := ℜ ( Y ff ℓ ) , B cf ℓ := ℑ ( Y ff ℓ ) , G f ℓ := ℜ ( Y ft ℓ ) , B f ℓ := ℑ ( Y ft ℓ ) , G ct ℓ := ℜ ( Y tt ℓ ) , B ct ℓ := ℑ ( Y tt ℓ ) , G t ℓ = ℜ ( Y tf ℓ ) ,and B t ℓ := ℑ ( Y tf ℓ ) . For every bus i ∈ N , we are givenbus parameters, including bounds [ v i , v i ] on voltagemagnitude, active (resp., reactive) power demand p d i (resp., q d i ), shunt conductance g s i , and shunt susceptance b s i . Furthermore, for every i ∈ N , we define subsets L F i := { ℓ ij : j ∈ N , ℓ ij ∈ L} and L T i := { ℓ ji : j ∈ N , ℓ ji ∈ L} of L and a set of generators G i .For every generator g ∈ G i , we are given generatorparameters, including bounds [ p G g , p G g ] (resp., [ q G g , q G g ] ) onthe amounts of active (resp., reactive) power generationand coefficients ( c ,g , c ,g ) of the quadratic generationcost function.Next we present decision variables. For every line ℓ ij ∈ L , we denote active (resp., reactive) power flowalong line ℓ by p F ℓ , p T ℓ (resp., q F ℓ , q T ℓ ). For every i ∈ N ,we denote the complex voltage by V i = v R i + i v I i , and weintroduce the following auxiliary variables: w RR ij = v R i v R j , w II ij = v I i v I j , w RI ij = v R i v I j , ∀ j ∈ N . (1) For every generator g ∈ G i , we denote the amounts ofactive (resp., reactive) power generation by p G g (resp., q G g ).In the following, we present a SOC OPF formulation: min X i ∈N X g ∈G i (cid:16) c ,g p G g + c ,g ( p G g ) (cid:17) (2a)subject to ∀ ℓ ij ∈ L : p F ℓ = G f ℓ ( w RR ij + w II ij ) + B f ℓ ( w RI ji + w RI ij ) + G cf ℓ ( w RR ii + w II ii ) , (2b) q F ℓ = G f ℓ ( w RI ji + w RI ij ) − B f ℓ ( w RR ij + w II ij ) − B cf ℓ ( w RR ii + w II ii ) , (2c) p T ℓ = G t ℓ ( w RR ji + w II ji ) + B t ℓ ( w RI ij + w RI ji ) + G ct ℓ ( w RR jj + w II jj ) , (2d) q T ℓ = G t ℓ ( w RI ij + w RI ji ) − B t ℓ ( w RR ji + w II ji ) − B ct ℓ ( w RR jj + w II jj ) , (2e) ( p F ℓ ) + ( q F ℓ ) ≤ ( s ℓ ) , ( p T ℓ ) + ( q T ℓ ) ≤ ( s ℓ ) , (2f) w RI ji − w RI ij ∈ [tan( θ ij )( w RR ij + w II ij ) , tan( θ ij )( w RR ij + w II ij )] , (2g) ∀ i ∈ N : X ℓ ∈L F i p F ℓ + X ℓ ∈L T i p T ℓ = X g ∈G i p G g − p d i − g s i ( w RR ii + w II ii ) , (2h) X ℓ ∈L F i q F ℓ + X ℓ ∈L T i q T ℓ = X g ∈G i q G g − q d i + b s i ( w RR ii + w II ii ) , (2i) w RR ii + w II ii ∈ [ v i , v i ] , (2j) ∀ i ∈ N , ∀ g ∈ G i : p G g ∈ [ p G g , p G g ] , q G g ∈ [ q G g , q G g ] , (2k) ∀ i ∈ N , ∀ j ∈ N :( w RR ij + w II ij ) + ( w RI ji − w RI ij ) + (cid:16) w RR ii + w II ii − w RR jj − w II jj (cid:17) ≤ (cid:16) w RR ii + w II ii + w RR jj + w II jj (cid:17) , (2l) where (2a) is to minimize the generation cost, (2b)–(2e) represent power flow, (2f) represent line thermallimit, (2g) represent bounds on voltage angle differences,(2h)–(2i) represent power balance, (2j) represent boundson voltage magnitudes, (2k) represent bounds on powergeneration, and (2l) represent SOC constraints that en-sure linking between auxiliary variables.We decompose the network into several zones indexedby Z := { , . . . , Z } . Specifically, we split a set N ofbuses into subsets {N z } z ∈Z such that N = ∪ z ∈Z N z and N z ∩ N z ′ = ∅ for z, z ′ ∈ Z : z = z ′ . For eachzone z ∈ Z we define a line set L z := ∪ i ∈N z (cid:0) L F i ∪L T i (cid:1) ; an extended node set V z := ∪ i ∈N z A i , where A i is a set of adjacent buses of i ; and a set of cuts C z = ∪ z ′ ∈Z\{ z } ( L z ∩ L z ′ ) . Note that {N z } z ∈Z is a collectionof disjoint sets, while {L z } z ∈Z and {V z } z ∈Z are not.Using these notations, we rewrite problem (2) as min X z ∈Z f z ( x z ) (3a)s.t. ( x z , y z ) ∈ F z ( ¯ D z ) , ∀ z ∈ Z , (3b) w i = y zi , ∀ z ∈ Z , ∀ i ∈ C ( z ) , (3c) w i ∈ R , ∀ i ∈ C , (3d) where x z ← (cid:8) p F zℓ , q F zℓ , p T zℓ , q T zℓ , w RR zij , w II zij , w RI zij , w RI zji (cid:9) ℓ ij ∈L z \C z ∪ (cid:8) v R zi , v I zi (cid:9) i ∈V z ∪ (cid:8) p G g , q G g (cid:9) i ∈N z ,g ∈G i ,y z ← (cid:8) p F zℓ , q F zℓ , p T zℓ , q T zℓ , w RR zij , w II zij , w RI zij , w RI zji (cid:9) ℓ ij ∈C z ,w ← ∪ z ∈Z (cid:8) p F ℓ , q F ℓ , p T ℓ , q T ℓ , w RR ij , w II ij , w RI ij , w RI ji (cid:9) ℓ ij ∈C z , C ( z ) is an index set that indicates each element of y z , C := ∪ z ∈Z C ( z ) is an index set of consensus vari-able w , f z ( x z ) := P i ∈N z P g ∈G i (cid:0) c ,g p G g + c ,g ( p G g ) (cid:1) , ¯ D z := { p d l } l ∈N z is a given demand vector, F z ( ¯ D z ) := { ( x z , y z ) : (2b) − (2g) , ∀ ℓ ij ∈ L z ; (2h) , (2i) , ∀ i ∈N z ; (2j) , ∀ i ∈ V z ; (2k) , ∀ i ∈ N z , ∀ g ∈ G i ; (2l) , ∀ i ∈V z , ∀ j ∈ V z } is a convex feasible region defined for eachzone, and (3c) represents the consensus constraints: ∀ z ∈ Z , ℓ ij ∈ C z : p F ℓ = p F zℓ , p T ℓ = p T zℓ , q F ℓ = q F zℓ , q T ℓ = q T zℓ ,w RR ij = w RR zij , w II ij = w II zij , w RI ij = w RI zij , w RI ji = w RI zji . Note that the consensus constraints with respect to w RR ij , w II ij , w RI ij , w RI ji are redundant but numerically benefi-cial. By introducing a dual vector λ := { λ zi } z ∈Z ,i ∈ C ( z ) associated with constraints (3c), one can construct aLagrangian dual problem: max λ ∈ Λ n H ( λ ) := X z ∈Z h z ( λ z ) o , (4a) where Λ := (cid:8) λ : P z ∈ F ( i ) λ zi = 0 , ∀ i ∈ C (cid:9) , F ( i ) := { z ∈ Z : i ∈ C ( z ) } is a set of zones for every i ∈ C and h z ( λ z ) is the optimal value of the subproblem: min ( x z ,y z ) ∈F z ( ¯ D z ) f z ( x z ) + X i ∈ C ( z ) λ zi y zi . (4b) Let λ ∗ be a maximizer of the nondifferentiable concavefunction H ( λ ) . Then H ( λ ∗ ) is the optimal value of (3)by the strong duality from the convexity of F z ( ¯ D z ) . Remark 1. If (2l) are replaced with (1) , then (2) is arectangular formulation of AC OPF. In this case, (4) maynot provide a solution that satisfies (3c) because of thenonconvexity of F z ( ¯ D z ) . Remark 2.
There exists y L i , y U i ∈ R such that y zi ∈ [ y L i , y U i ] , ∀ i ∈ C , ∀ z ∈ F ( i ) . III. D
IFFERENTIALLY P RIVATE C ONTROL
The Lagrangian dual problem (4) can be solved byany nonsmooth convex optimization algorithms. In thispaper we consider the PS algorithm, λ k +1 = Proj Λ ( λ k + α k y k ) , (5) where Proj Λ ( · ) represents the orthogonal projection onto Λ , α k is a step size, and y k is a search direction.
1) Motivating Example (Data Leakage):
The existingalgorithms can be susceptible to data leakage becausean adversary may be able to estimate data based onsolution information exchanged within the algorithms.To see this, we suppose an extreme case that an adversarycan access the PS algorithm, more specifically, (i) allthe demand information except for node ˆ l in zone ˆ z ,namely, ¯ D ˆ z ˆ l , (ii) supergradient { ˆ y k ˆ z } k ∈ [ K ] exchangedwithin PS, and (iii) the ˆ z th subproblem (4b) and itssolution { ˆ x k ˆ z } k ∈ [ K ] , where K is the total iterations. Based on this information, the adversary can constructthe following empirical risk minimization problem toinfer ¯ D ˆ z ˆ l : min D ˆ z ˆ l ,x k ˆ z ,y k ˆ z X k ∈K f ˆ z ( x k ˆ z ) + Γ (cid:8) || x k ˆ z − ˆ x k ˆ z || + || y k ˆ z − ˆ y k ˆ z || (cid:9) (6)subject to ∀ k ∈ K : (2b) − (2g) , ∀ ℓ ij ∈ L ˆ z ; (2i) , ∀ i ∈ N ˆ z ; (2j) , ∀ i ∈ V ˆ z ; (2k) , ∀ i ∈ N ˆ z , ∀ g ∈ G i ; (2l) , ∀ i ∈ V ˆ z , ∀ j ∈ V ˆ z ; X ℓ ∈L F ˆ l p F k ˆ zℓ + X ℓ ∈L T ˆ l p T k ˆ zℓ = X g ∈G ˆ l p G kg − D ˆ z ˆ l − g s ˆ l ( w RR k ˆ z ˆ l ˆ l + w II k ˆ z ˆ l ˆ l ); X ℓ ∈L F ˆ l p F k ˆ zℓ + X ℓ ∈L T ˆ l p T k ˆ zℓ = X g ∈G ˆ l p G kg − p d i − g s ˆ l ( w RR k ˆ z ˆ l ˆ l + w II k ˆ z ˆ l ˆ l ) , ∀ i ∈ N ˆ z \ { ˆ l } , where Γ > is a penalty parameter. Note that { ˆ x k ˆ z } is not exchanged within the algorithm, but we considerthe worst case that { ˆ x k ˆ z } is also leaked to the adversary.As the cardinality of K increases, the accuracy of thedemand estimated by (6) increases while sacrificingcomputation. We denote by b K a collection of various K and by D ˆ z ˆ l ( K ) a demand estimated by (6) with K ∈ b K .We demonstrate the effectiveness of adversary prob-lem (6) by using an instance “case 14” from Matpower[20] with the decomposition into zones (see Table II).We solve the distributed OPF of (4) by using PS. Ateach iteration k , an approximation error is measured asbelow and reported in Figure 1: AE k = 100 | Z ∗ − Z k | /Z ∗ , ∀ k ∈ [ K ] , (7) where Z ∗ is the optimal objective value and Z k arethe objective values computed at the k th iteration of PS,respectively.We consider the adversary who aims to estimate thedemand at node ˆ l = 4 in zone ˆ z = 1 , namely, ¯ D =47 . MW. For every trial
K ∈ b K , we solve (6) and reportin Figure 1 a demand estimation error: DE ( K ) := 100 | ¯ D ˆ z ˆ l − D ˆ z ˆ l ( K ) | / ¯ D ˆ z ˆ l , ∀K ∈ b K , (8) where b K ← (cid:8) { } , . . . , { K } (cid:9) (various b K will be dis-cussed in Section V). Figure 1 shows that the adversaryis highly likely to estimate ¯ D , and hence this situationmotivates the need for the solution encryption to preservedata privacy. Fig. 1: Approximation error (left) of PS and demandestimation error (right) of the adversarial problem.
2) Differential Privacy in PS:
The motivating exam-ple suggests that PS for solving (4) might be vulnerableto data leakage. To preserve data privacy, we introducedifferential privacy (see [9] for more details).
Definition 1. ( ¯ ǫ -differential privacy) A randomizedfunction R that maps data D to some random numbersgives ¯ ǫ -differential privacy if (cid:12)(cid:12)(cid:12)(cid:12) ln (cid:18) P {R ( D ′ ) ∈ S } P {R ( D ′′ ) ∈ S } (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ¯ ǫ, ∀ ( D ′ , D ′′ ) ∈ D β , ∀ S ⊆ Range ( R ) , where ¯ ǫ > , the probability taken is over the coin tossesof R , and D β is a collection of two datasets ( D ′ , D ′′ ) differing in one element by β ∈ R + . For small ¯ ǫ ≈ ln (1 + ¯ ǫ ) , we have P {R ( D ′ ) ∈ S } P {R ( D ′′ ) ∈ S } ∈ [1 − ¯ ǫ, ǫ ] , which implies that distinguishing D ′ from D ′′ based on S becomes more difficult as ¯ ǫ decreases.To construct R ( D ) that ensures ¯ ǫ -differential privacy ondata D , one can utilize a Laplace mechanism [8]. Morespecifically, a query function Q : D R mapping datato true answer is perturbed by adding Laplacian noisedescribed in Definition 2. Definition 2. (Laplacian noise)
Laplacian noise ˜ ξ ∈ R is a random variable following the Laplace distribu-tion whose probability density function is L ( ˜ ξ | b ) = b exp (cid:0) − | ˜ ξ | b (cid:1) for b > . The randomized function R ( D ) := Q ( D ) + ˜ ξ provides ¯ ǫ -differential privacy if ˜ ξ is drawn from the Laplace dis-tribution with b = max ( D ′ ,D ′′ ) ∈D β |Q ( D ′ ) − Q ( D ′′ ) | / ¯ ǫ .The main idea of DP-PS is to perturb y k with thenoise ˜ ξ k such that ˜ y kzi ← y kzi + ˜ ξ kzi , ∀ z ∈ Z , ∀ i ∈ C ( z ) , (9) for every iteration k of PS. We describe the algorithmicsteps of DP-PS in Algorithm 1. In line 3, we finda supergradient y k of the concave function H at λ k .In lines 6–8, we generate the Laplacian noise ˜ ξ k andthe noisy supergradient ˜ y k . In line 9, we update dualvariables based on the step size α k and the searchdirection s k (˜ y k ) determined in advance (see Section IV).Now we describe how to generate the noise ˜ ξ kzi in (9)so that the ¯ ǫ -differential privacy in Definition 1 on ¯ D isensured. First, we define a query function as follows: Q kzi : D z y kzi , ∀ z ∈ Z , ∀ i ∈ C ( z ) , (10) where y kzi is obtained by solving (4b) for given λ k and D z ∈ R |N z | . Second, we draw ˜ ξ kzi in (9) from theLaplace distribution L ( ˜ ξ | b ) with b = ¯ ∆ kzi ( ¯ β ) / ¯ ǫ and ¯ ∆ kzi ( ¯ β ) := max ( D ′ z ,D ′′ z ) ∈D ¯ β ( z ) |Q kzi ( D ′ z ) − Q kzi ( D ′′ z ) | , (11) where the data collection is defined as a polyhedron D ¯ β ( z ) := ∪ l ∈N z n ( D ′ z , D ′′ z ) ∈ R |N z | × R |N z | : Algorithm 1
DP Projected Subgradient Algorithm Set k ← and λ ← . while termination criteria are not met do Given λ k , find y k by solving (4b) in parallel. Store H best ( λ k ) ← max t ∈ [ k ] { H ( λ t ) } . y k Solve (12) in parallel to find { ¯ ∆ kzi ( ¯ β ) } z ∈Z ,i ∈ C ( z ) . Extract ˜ ξ kzi from L ( ˜ ξ kzi | ¯ ∆ kzi ( ¯ β ) / ¯ ǫ ) in Definition 2. Compute ˜ y k by (9). λ k +1 ← Proj Λ (cid:0) λ k + α k s k (˜ y k ) (cid:1) . Set k ← k + 1 . end while | D ′ zl − D ′′ zl | ≤ ¯ β l , D ′ zj = D ′′ zj , ∀ j ∈ N z \ { l } o , and ¯ β l > . Since (11) is a convex maximizationproblem, we need only to search the extreme points anddirections of D ¯ β ( z ) without loss of optimality. Moreover, D ′′ z is given as demand data ¯ D z in the context of OPF.Thus, by fixing D ′′ z = ¯ D z , the set of extreme pointsin D ¯ β ( z ) is given by b D ¯ β ( z ) := { ¯ D z + β lz , ¯ D z − β lz : l ∈ N z } , where β lz ∈ R |N z | is a vector defined forevery l ∈ N z such that ( β lz ) j = 0 if j ∈ N z \ { l } and ( β lz ) l = ¯ β l . Eventually, (11) reduces to ¯ ∆ kzi ( ¯ β ) = max D ′ z ∈ b D ¯ β ( z ) |Q kzi ( D ′ z ) − Q kzi ( ¯ D z ) | . (12) Theorem 1.
In the k th iteration of DP-PS, we consider R kzi ( D z ) := Q kzi ( D z ) + ˜ ξ kzi , ∀ z ∈ Z , ∀ i ∈ C ( z ) , (13) where Q kzi is defined in (10) , ˜ ξ kzi is extracted from L ( ˜ ξ kzi | ¯ ∆ kzi ( ¯ β ) / ¯ ǫ ) in Definition 2, and ¯ ∆ kzi ( ¯ β ) is from (12) . For all z ∈ Z , i ∈ C ( z ) , we have (cid:12)(cid:12)(cid:12)(cid:12) ln (cid:18) P {R kzi ( D ′ z ) ∈ S k } P {R kzi ( ¯ D z ) ∈ S k } (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ¯ ǫ, ∀ D ′ z ∈ b D ¯ β ( z ) , ∀ S k ⊆ Range ( R kzi ) , (14) where R kzi ( ¯ D z ) is equal to ˜ y kzi in (9) . This implies that ¯ ǫ -differential privacy on ¯ D is ensured in the k th iterationof DP-PS. Moreover, in order to ensure ¯ ǫ -differentialprivacy on ¯ D during the entire process of DP-PS, ˜ ξ kzi should be extracted from L ( ˜ ξ kzi | K ¯ ∆ kzi ( ¯ β ) / ¯ ǫ ) , where K is the total number of iterations.Proof. See Appendix A.
Remark 3. (Target Privacy Level)
Since ¯ ∆ kzi ( ¯ β ) in (12) is nondecreasing as ¯ β increases, the variance of ˜ ξ kzi , namely,
2( ¯ ∆ kzi ( ¯ β ) / ¯ ǫ ) , increases as ¯ β increases and ¯ ǫ decreases. For ease of exposition, we denote TPL ofDP-PS by ¯ γ := || ¯ β || / ¯ ǫ . Since the Laplace distribution has infinite support,the Laplace mechanism can return unnecessarily ex-treme values [21]. For this reason, either a truncated or bounded Laplace mechanism [21] can be utilized, wherebounds on the noise are introduced. In this paper we use ¯ α % -confidence interval to bound the noise: ∃ ˜ ξ U zi (¯ γ ) ∈ R + : ˜ ξ zi ∈ [ − ˜ ξ U zi (¯ γ ) , ˜ ξ U zi (¯ γ )] , (15)where ˜ ξ U zi (¯ γ ) increases as TPL ¯ γ increases. We empha-size that a solution provided by DP-PS is always optimalregardless of the choice of ¯ α , ¯ γ and ˜ ξ U zi (¯ γ ) .IV. C ONVERGENCE OF
DP-PSIn this section we study how TPL ¯ γ affects theconvergence of DP-PS. In Table I we describe threerules for determining step size and search direction.Note that the step size in Rule 1 is deterministic and square-summable but not summable and that the stepsizes in Rule 2 and Rule 3 are stochastic and affectedby ˜ y k , which are a variant of Polyak [22] and CFM [23],respectively.TABLE I: Three rules for DP-PS. Rule Step size Search direction1 α k = a/k ,where a > . s k (˜ y k ) := ˜ y k α k := H ( λ ⋆ ) − H ( λ k ) || s k (˜ y k ) || s k (˜ y k ) := ˜ y k α k := H ( λ ⋆ ) − H ( λ k ) || s k (˜ y k ) || s k (˜ y k ) := ˜ y k + ζ k s k − (˜ y k − ) , ζ k := max { , − χ k h s k − (˜ y k − ) , ˜ y k i|| s k − (˜ y k − ) || } ,where s = 0 and χ k ∈ [0 , Assumption 1. Λ is compact, and λ ∗ ∈ Λ maximizes H . Lemma 1.
For all k ∈ N , (i) || ˜ y k || ∈ [ G L , G U (¯ γ )] ,where G L is a small positive number, and G U (¯ γ ) := X z ∈Z X i ∈ C ( z ) n(cid:2) max (cid:8) | y L i | , | y U i | (cid:9)(cid:3) + ˜ ξ U zi (¯ γ ) +2˜ ξ U zi (¯ γ ) max (cid:8) | y L i | , | y U i | (cid:9)o , (16) and (ii) we have the following basic inequality: || λ k +1 − λ ⋆ || ≤ || λ k − λ ⋆ || + α k || s k (˜ y k ) || +2 α k (cid:0) H ( λ k ) − H ( λ ⋆ ) (cid:1) + 2 α k h s k (˜ y k ) − y k , λ k − λ ⋆ i . (17) Proof. (i) (16) holds from Remark 2, (9), and (15).(ii) (17) holds because of the nonexpansion property ofthe projection and the supergradient inequality, namely, H ( λ ) − H ( λ k ) ≤ h y k , λ − λ k i for all λ ∈ Λ .We emphasize that G U (¯ γ ) increases as ¯ γ increases.
1) Rule 1:
Under Rule 1 it follows from (17) that || λ k +1 − λ ⋆ || ≤ || λ k − λ ⋆ || + α k G U (¯ γ )+2 α k (cid:0) H ( λ k ) − H ( λ ⋆ ) (cid:1) + 2 α k h ˜ ξ k , λ k − λ ⋆ i , ∀ k ∈ N . (18) By taking expectation on (18), we obtain E (cid:2) || λ k +1 − λ ⋆ || (cid:3) ≤ E (cid:2) || λ k − λ ⋆ || (cid:3) + α k G U (¯ γ )+2 α k E (cid:2) H ( λ k ) − H ( λ ⋆ ) (cid:3) , ∀ k ∈ N , (19) where the inequality holds beccause of E [ ˜ ξ kzi ] = 0 forall z ∈ Z and i ∈ C ( z ) . We recursively add (19) from k = 1 to k = K to obtain E (cid:2) || λ K +1 − λ ⋆ || (cid:3) ≤ || λ − λ ⋆ || + G U (¯ γ ) K X k =1 α k + 2 K X k =1 α k (cid:16) E (cid:2) H ( λ k ) (cid:3) − H ( λ ⋆ ) (cid:17) . (20) Since ∃ λ U : λ U ≥ || λ − λ ⋆ || by Assumption 1 and E (cid:2) || λ K +1 − λ ⋆ || (cid:3) ≥ , (20) can be expressed as λ U + G U (¯ γ ) K X k =1 α k ≥ K X k =1 α k (cid:16) H ( λ ⋆ ) − E (cid:2) H ( λ k ) (cid:3)(cid:17) ≥ (cid:16) K X k =1 α k (cid:17)(cid:16) H ( λ ⋆ ) − max k ∈ [ K ] E [ H ( λ k )] (cid:17) ≥ (cid:16) K X k =1 α k (cid:17)(cid:16) H ( λ ⋆ ) − E (cid:2) max k ∈ [ K ] H ( λ k ) (cid:3)(cid:17) , (21) where the last inequality holds due to Jensen’s inequal-ity. By substituting α k = a/k in (21), we obtain H ( λ ⋆ ) − E (cid:2) H best ( λ K ) (cid:3) ≤ λ U + G U (¯ γ ) P ∞ k =1 ( a/k ) P Kk =1 ( a/k ) , (22) where H best ( λ K ) := max k ∈ [ K ] H ( λ k ) . Theorem 2.
Algorithm 1 with Rule 1 provides a se-quence that converges in expectation and probability,namely, lim K →∞ E (cid:2) H best ( λ K ) (cid:3) = H ( λ ⋆ ) , (23a) lim K →∞ P (cid:8) H ( λ ⋆ ) − H best ( λ K ) ≥ ǫ (cid:9) = 0 , (23b) for any ǫ > . Furthermore, the rate of convergencein expectation is O ( G U (¯ γ ) / log( K )) , where G U (¯ γ ) in-creases as TPL ¯ γ increases.Proof. See Appendix B.To show that Algorithm 1 provides a sequence thatconverges with probability , we introduce the notion ofthe stochastic quasi-Feyer sequence in Definition 3. Definition 3. (Stochastic quasi-Feyer sequence [24])
A sequence of random vectors { z k } ∞ k =1 is a stochasticquasi-Feyer sequence for a set Z ⊂ R n if E [ || z || ] < ∞ , and for any z ∈ Z , E (cid:2) || z − z k +1 || | z , . . . , z k (cid:3) ≤ || z − z k || + d k , ∀ k ∈ N ,d k ≥ , ∀ k ∈ N , ∞ X k =1 E [ d k ] < ∞ . Theorem 3.
Algorithm 1 with Rule 1 provides a se-quence that converges with probability , namely, P (cid:8) lim K →∞ H best ( λ K ) = H ( λ ⋆ ) (cid:9) = 1 . (24) Proof.
See Appendix C.
2) Rule 2:
Under Rule 2 it follows from (17) that || λ k +1 − λ ⋆ || ≤ || λ k − λ ⋆ || − (cid:0) H ( λ ⋆ ) − H ( λ k ) (cid:1) || ˜ y k || +2 H ( λ ⋆ ) − H ( λ k ) || ˜ y k || || ˜ ξ k || · || λ k − λ ⋆ || (25) ≤ || λ k − λ ⋆ || − (cid:0) H ( λ ⋆ ) − H ( λ k ) (cid:1) G U (¯ γ ) + M (¯ γ ) , where the first inequality holds due to the Cauchy–Schwarz inequality and the last inequality holds due tothe existence of M (¯ γ ) ∈ (0 , ∞ ) based on (15), Lemma1, and Assumption 1. By taking the expectation andapplying Jensen’s inequality, we have E (cid:2) || λ k +1 − λ ⋆ || (cid:3) ≤ E (cid:2) || λ k − λ ⋆ || (cid:3) − (cid:0) H ( λ ⋆ ) − E (cid:2) H ( λ k ) (cid:3)(cid:1) /G U (¯ γ ) + M (¯ γ ) . (26) Following the similar derivation from Rule 1, we obtain H ( λ ⋆ ) − E (cid:2) H best ( λ K ) (cid:3) ≤ s (cid:0) λ U + KM (¯ γ ) (cid:1) G U (¯ γ ) K . (27)
Based on (27), we state the following proposition.
Proposition 1.
Algorithm 1 with Rule 2 produces asequence that converges in expectation to a point within p M (¯ γ ) G U (¯ γ ) of the optimal value. Since M (¯ γ ) G U (¯ γ ) increases as TPL ¯ γ increases, it implies that there existsa trade-off between TPL and solution accuracy. We show, however, that the trade-off vanishes underthe following assumption.
Assumption 2. (Adapted from Assumption 3.1 in [25])There exists µ > such that µ || λ − λ ⋆ || ≤ H ( λ ⋆ ) − H ( λ ) , ∀ λ ∈ Λ, (28a) || s k (˜ y k ) − y k || < µ/ , ∀ k ∈ N , (28b) where the first inequality indicates that the function H has a sharp set of maxima over a convex set Λ and thesecond inequality indicates that the distance between thesearch direction and the supergradient is bounded. Assumption 2 is mild since the function H is polyhe-dral for our case with a reasonable choice of TPL. Theorem 4.
Under Assumption 2, Algorithm 1 with Rule2 provides a sequence that converges in expectation, inprobability, and with probability . The rate of conver-gence in expectation is O ( G U (¯ γ ) / √ K ) , where G U (¯ γ ) increases as TPL ¯ γ increases.Proof. See Appendix D.
3) Rule 3:
Under Rule 3 the search direction s k (˜ y k ) is a linear combination of { ˜ y k ′ } k − k ′ =1 . Lemma 2.
Under Rule 3 we have || s k (˜ y k ) || = || ˜ y k + ζ k s k − (˜ y k − ) || ≤ || ˜ y k || , ∀ k ∈ N . Proof. If ζ k = 0 , then s k (˜ y k ) = ˜ y k . If ζ k > , then || ˜ y k + ζ k s k − || − || ˜ y k || = ζ k || s k − || + 2 ζ k h s k − , ˜ y k i = χ k h s k − , ˜ y k i || s k − || − χ k h s k − , ˜ y k i || s k − || ≤ , where the last inequality holds since χ k ∈ [0 , asdefined in Table I.From Lemma 2, similar results from Rule 2 can bederived. Under Rule 3 it follows from (17) that || λ k +1 − λ ⋆ || ≤ || λ k − λ ⋆ || − (cid:0) H ( λ ⋆ ) − H ( λ k ) (cid:1) || s k (˜ y k ) || +2 H ( λ ⋆ ) − H ( λ k ) || s k (˜ y k ) || || s k (˜ y k ) − y k || · || λ k − λ ⋆ || (29) ≤ || λ k − λ ⋆ || − (cid:0) H ( λ ⋆ ) − H ( λ k ) (cid:1) G U (¯ γ ) + R (¯ γ ) , where the last inequality holds due to Lemma 2 and theexistence of R (¯ γ ) ∈ (0 , ∞ ) based on the boundness of s k (˜ y k ) by its construction, Lemma 1, and Assumption 1.We emphasize that (29) is similar to (25). Thus one canderive results similar to (27), Proposition 1, and Theorem4 under Rule 3. Remark 4.
We remark that all the results related to theconvergence of DP-PS also hold when solving AC OPFdescribed in Remark 1. However, the strong duality doesnot hold for AC OPF, so the consensus constraints maynot be satisfied at termination.
V. N
UMERICAL E XPERIMENTS
To support our findings from Sections III and IV,we showcase that increasing TPL of DP-PS leads tohigher data privacy against adversarial attempts anddoes not affect the solution accuracy, although it doesaffect computation. In all the experiments, we solveoptimization models by IPOPT [26] via Julia 1.5.0 on apersonal laptop with an Intel Core i9 CPU and 64 GBof RAM.
1) Experimental Settings:
For the power networkinstances, we consider case 14 and case 118 fromMatpower [20]. The optimal objective values of SOCOPF (2) are obtained by utilizing IPOPT: Z ∗ = 8075 . for case 14 and Z ∗ = 129341 . for case 118. Thenetworks are decomposed as described in Table II.We consider an adversary who aims to estimate ¯ D =47 . MW for case 14 and ¯ D = 39 MW for case 118,respectively, by solving the adversarial problem (6) for | b K ( T ) | times, where T is any integer number less thantotal iterations K of DP-PS and TABLE II: Set N z of buses for each zone z ∈ Z . case 14 case 118 [27]Zone 1 { } { } , { } , { } Zone 2 { } { } , { } , { } Zone 3 { } , { } { } b K ( T ) := ∪ ⌊ K/T ⌋ t =1 { ( t − T + 1 , . . . , tT } . (30) Recall that b K in Section III-1 is b K (1) .On the other hand, we aim to protect demand datafrom the adversary by using the proposed DP-PS. Weconsider various TPLs of DP-PS; we set ¯ ǫ = 1 ,and ¯ β l = ¯ D zl ∗ ¯ δ/ , ∀ l ∈ N z , ∀ z ∈ Z , where ¯ δ ∈ { , , . , , . , } % as in [13], and ¯ δ ∈{ , , } % for higher privacy. Note that ¯ δ impliesTPL and that DP-PS with ¯ δ = 0 is a non-private PS. Weuse Rule 3 in Section IV for our experiments.
2) Convergence of DP-PS:
We demonstrate the nu-merical support for Theorem 4 that increasing TPL doesnot affect the solution accuracy of DP-PS, although itdoes affect computation.
Solution Accuracy:
In Figure 2 we report the optimalitygap at each iteration k of DP-PS. The results show thatthe sequence generated by DP-PS converges regardlessof the value of TPL ¯ δ . Computation:
Figure 2 demonstrates that DP-PS withhigher TPL requires more iterations to converge. InFigure 3 we report the total number of iterations requiredfor DP-PS to converge to a solution within of theoptimality gap. The results show the increasing trendsof total iterations as ¯ δ increases. This implies that thereexists a trade-off between TPL and computation. Fig. 2: Optimality gap of DP-PS that solves case 14 (left)and case 118 (right) under various ¯ δ .
3) Data Privacy Preservation:
We numerically showthat increasing TPL provides higher data privacy.First, we consider various b K ( T ) when construct-ing the adversarial problem (6). As T increases, the-oretically, the accuracy of the demand estimated bysolving (6) with K ∈ b K ( T ) increases. We report inFigure 4 an average demand estimation error (DEE): P K∈ b K ( T ) DE ( K ) / | b K ( T ) | , where DE ( K ) is defined in(8). The results show (i) increasing trends of the averageDEE as ¯ δ increases for fixed T and (ii) decreasing trends Fig. 3: Total iterations of DP-PS for solving case 14(left) and case 118 (right) required to enter within of the approximation error.of the average DEE as T increases for fixed ¯ δ . Moreover,the average DEE for fixed ¯ δ seems to converge to apoint as T increases. The results imply that increasingTPL eventually produces data privacy regardless of theadversarial type, which matches Definition 1. Fig. 4: Average demand estimation error under various b K ( T ) for case 14 (left) and case 118 (right).
4) Summary:
We report in Figure 5 the optimalitygap at the termination of DP-PS and the adversarial’schance of success (CoS) defined as follow:
CoS ( G ) = 100 × X T ∈T X K∈ b K ( T ) I ( DE ( K )) ≤ G ) / X T ∈T | b K ( T ) | , where G is a prespecified value (e.g., G = 1% ), T is a collection of various T , b K ( T ) is in (30), DE ( K ) is in (8), I ( DE ( K ) ≤ G ) = 1 if DE ( K ) ≤ G and I ( DE ( K ) ≤ G ) = 0 otherwise. The results demonstratethat as TPL ¯ δ increases, the adversarial’s chance of suc-cessful demand estimation decreases while the optimalitygap still remains the same. This implies that there is notrade-off between data privacy and solution accuracy. Fig. 5: Summary for case 14 (left) and case 118 (right).
5) AC OPF:
In this section we show that the con-vergence and the data privacy preservation of DP-PS arealso achieved when solving AC OPF (see Remarks 1 and Fig. 6: Approximation error of DP-PS that solves case14 (left) and case 118 (right) under various ¯ δ . Fig. 7: Average demand estimation error under various b K ( T ) for case 14 (left) and case 118 (right).VI. C ONCLUSION
We studied a privacy-preserving distributed OPF andproposed a differentially private projected subgradient(DP-PS) algorithm that includes a solution encryptionstep. In this algorithm Laplacian noise is introduced toencrypt solution exchanged within the algorithm, whichleads to ¯ ǫ − differential privacy on data. The target privacylevel (TPL) of DP-PS is chosen by users, which affectsnot only the data privacy but also the convergence ofthe algorithm. We showed that increasing TPL providesbetter data privacy, but it requires more iterations for asequence to converge in expectation. This result indicatesthat a trade-off exists between the data privacy andcomputation. Fortunately, using DP-PS, one can avoid atrade-off between the data privacy and solution accuracy.R EFERENCES[1] Y. Levron, J. M. Guerrero, and Y. Beck, “Optimal power flowin microgrids with energy storage,”
IEEE Transactions on PowerSystems , vol. 28, no. 3, pp. 3226–3234, 2013.[2] D. K. Molzahn, F. D¨orfler, H. Sandberg, S. H. Low,S. Chakrabarti, R. Baldick, and J. Lavaei, “A survey of distributedoptimization and control algorithms for electric power systems,”
IEEE Transactions on Smart Grid , vol. 8, no. 6, pp. 2941–2962,2017.[3] A. X. Sun, D. T. Phan, and S. Ghosh, “Fully decentralized acoptimal power flow algorithms,” in . IEEE, 2013, pp. 1–5.[4] S. Mhanna, G. Verbiˇc, and A. C. Chapman, “Adaptive ADMMfor distributed AC optimal power flow,”
IEEE Transactions onPower Systems , vol. 34, no. 3, pp. 2025–2035, 2018.[5] S. Mhanna, A. C. Chapman, and G. Verbiˇc, “Component-baseddual decomposition methods for the OPF problem,”
SustainableEnergy, Grids and Networks , vol. 16, pp. 91–110, 2018.[6] K. Sun and X. A. Sun, “A two-level ADMM algorithmfor AC OPF with convergence guarantees,” arXiv preprintarXiv:2008.12139 , 2020. [7] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membershipinference attacks against machine learning models,” in . IEEE, 2017, pp. 3–18.[8] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibratingnoise to sensitivity in private data analysis,” in
Theory of cryp-tography conference . Springer, 2006, pp. 265–284.[9] C. Dwork, A. Roth et al. , “The algorithmic foundations of differ-ential privacy.”
Foundations and Trends in Theoretical ComputerScience , vol. 9, no. 3-4, pp. 211–407, 2014.[10] T. Zhang and Q. Zhu, “Dynamic differential privacy for ADMM-based distributed classification learning,”
IEEE Transactions onInformation Forensics and Security , vol. 12, no. 1, pp. 172–187,2016.[11] ——, “A dual perturbation approach for differential privateADMM-based distributed empirical risk minimization,” in
Pro-ceedings of the 2016 ACM Workshop on Artificial Intelligenceand Security , 2016, pp. 129–137.[12] Z. Huang, R. Hu, Y. Guo, E. Chan-Tin, and Y. Gong, “DP-ADMM: ADMM-based distributed learning with differential pri-vacy,”
IEEE Transactions on Information Forensics and Security ,vol. 15, pp. 1002–1012, 2019.[13] V. Dvorkin, P. Van Hentenryck, J. Kazempour, and P. Pinson,“Differentially private distributed optimal power flow,” arXivpreprint arXiv:1910.10136 , 2019.[14] S. Song, K. Chaudhuri, and A. D. Sarwate, “Stochastic gradientdescent with differentially private updates,” in . IEEE, 2013,pp. 245–248.[15] S. Han, U. Topcu, and G. J. Pappas, “Differentially privatedistributed constrained optimization,”
IEEE Transactions on Au-tomatic Control , vol. 62, no. 1, pp. 50–64, 2016.[16] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov,K. Talwar, and L. Zhang, “Deep learning with differential pri-vacy,” in
Proceedings of the 2016 ACM SIGSAC Conference onComputer and Communications Security , 2016, pp. 308–318.[17] C. Coffrin, H. L. Hijazi, and P. Van Hentenryck, “The QCrelaxation: A theoretical and computational study on optimalpower flow,”
IEEE Transactions on Power Systems , vol. 31, no. 4,pp. 3008–3018, 2015.[18] ——, “Strengthening the SDP relaxation of AC power flows withconvex envelopes, bound tightening, and valid inequalities,”
IEEETransactions on Power Systems , vol. 32, no. 5, pp. 3549–3558,2016.[19] B. Kocuk, S. S. Dey, and X. A. Sun, “Strong SOCP relaxationsfor the optimal power flow problem,”
Operations Research ,vol. 64, no. 6, pp. 1177–1196, 2016.[20] R. D. Zimmerman, C. E. Murillo-S´anchez, and R. J. Thomas,“MATPOWER: steady-state operations, planning, and analysistools for power systems research and education,”
IEEE Transac-tions on power systems , vol. 26, no. 1, pp. 12–19, 2010.[21] N. Holohan, S. Antonatos, S. Braghin, and P. Mac Aonghusa,“The bounded Laplace mechanism in differential privacy,”
Jour-nal of Privacy and Confidentiality , vol. 10, no. 1, 2020.[22] B. T. Polyak, “Introduction to optimization. 1987,”
OptimizationSoftware, Inc, New York .[23] P. M. Camerini, L. Fratta, and F. Maffioli, “On improvingrelaxation methods by modified gradient techniques,” in
Non-differentiable optimization . Springer, 1975, pp. 26–34.[24] Y. M. Ermoliev and R.-B. Wets,
Numerical techniques forstochastic optimization . Springer-Verlag, 1988.[25] A. Nedi´c and D. P. Bertsekas, “The effect of deterministic noisein subgradient methods,”
Mathematical programming , vol. 125,no. 1, pp. 75–99, 2010.[26] A. W¨achter and L. T. Biegler, “On the implementation of aninterior-point filter line-search algorithm for large-scale nonlinearprogramming,”
Mathematical programming , vol. 106, no. 1, pp.25–57, 2006.[27] J. Guo, G. Hug, and O. K. Tonguz, “Intelligent partitioningin distributed optimization of electric power systems,”
IEEETransactions on Smart Grid , vol. 7, no. 3, pp. 1249–1258, 2015.
The submitted manuscript has been created byUChicago Argonne, LLC, Operator of Argonne Na-tional Laboratory (“Argonne”). Argonne, a U.S. De-partment of Energy Office of Science laboratory, isoperated under Contract No. DE-AC02-06CH11357.The U.S. Government retains for itself, and others act-ing on its behalf, a paid-up nonexclusive, irrevocableworldwide license in said article to reproduce, preparederivative works, distribute copies to the public, andperform publicly and display publicly, by or on behalfof the Government. The Department of Energy willprovide public access to these results of federallysponsored research in accordance with the DOE PublicAccess Plan (http://energy.gov/downloads/doe-public-access-plan). A
PPENDIX AP ROOF OF T HEOREM k th iteration of DP-PS, for all z ∈ Z and i ∈ C ( z ) , we denote by P R kzi ( D z ) (˜ y kzi ) the probabilitydensity at any ˜ y kzi ∈ S k , where R kzi is defined in (13)and S k is any subset of Range ( R kzi ) . Then we have P {R kzi ( D z ) ∈ S k } = Z S k P R kzi ( D z ) (˜ y kzi ) d ˜ y kzi . Now consider the following ratio: P R kzi ( D ′ z ) (˜ y kzi ) P R kzi ( ¯ D z ) (˜ y kzi ) = L (˜ y kzi − Q ( D ′ z ) | ¯ ∆ kzi ( ¯ β ) / ¯ ǫ ) L (˜ y kzi − Q ( ¯ D z ) | ¯ ∆ kzi ( ¯ β ) / ¯ ǫ )= exp (cid:16)(cid:0) ¯ ǫ/ ¯ ∆ kzi ( ¯ β ) (cid:1)(cid:0) | ˜ y kzi − Q kzi ( ¯ D z ) | − | ˜ y kzi − Q kzi ( D ′ z ) | (cid:1)(cid:17) ≤ exp (cid:16)(cid:0) ¯ ǫ/ ¯ ∆ kzi ( ¯ β ) (cid:1)(cid:0) |Q kzi ( D ′ z ) − Q kzi ( ¯ D z ) | (cid:1)(cid:17) ≤ exp(¯ ǫ ) , ∀ D ′ z ∈ b D ¯ β ( z ) , where L is from Definition 2, the first inequality holdsdue to the reverse triangle inequality, namely, | a | − | b | ≤| a − b | , and the last inequality holds since ¯ ∆ kzi ( ¯ β ) ≥|Q kzi ( D ′ z ) − Q kzi ( ¯ D z ) | for all D ′ z ∈ b D ¯ β ( z ) from (12).Similarly, one can obtain a lower bound as follows: exp (cid:16)(cid:0) ¯ ǫ/ ¯ ∆ kzi ( ¯ β ) (cid:1)(cid:0) | ˜ y kzi − Q kzi ( ¯ D z ) | − | ˜ y kzi − Q kzi ( D ′ z ) | (cid:1)(cid:17) ≥ exp (cid:16) − (cid:0) ¯ ǫ/ ¯ ∆ kzi ( ¯ β ) (cid:1)(cid:0) |Q kzi ( D ′ z ) − Q kzi ( ¯ D z ) | (cid:1)(cid:17) ≥ exp( − ¯ ǫ ) , ∀ D ′ z ∈ b D ¯ β ( z ) , where the first inequality holds due to the reverse trian-gle inequality, namely, | a | − | b | ≥ −| a − b | . Therefore,we have exp( − ¯ ǫ ) ≤ P R kzi ( D ′ z ) (˜ y kzi ) P R kzi ( ¯ D z ) (˜ y kzi ) ≤ exp(¯ ǫ ) , ∀ D ′ z ∈ b D ¯ β ( z ) , and integrating ˜ y kzi over S k yields (14). This provesthat ¯ ǫ -differential privacy on data is guaranteed for eachiteration k of DP-PS. Second, for all z ∈ Z and i ∈ C ( z ) , we denote by R zi a randomized function that maps the dataset D z ∈ R |N z | to ˜ y zi := { ˜ y kzi } Kk =1 , where K is the total number ofiterations consumed by DP-PS. It suffices to show that (cid:12)(cid:12)(cid:12) ln (cid:16) P {R zi ( D ′ z ) ∈ S } P {R zi ( ¯ D z ) ∈ S } (cid:17)(cid:12)(cid:12)(cid:12) ≤ ¯ ǫ, ∀ D ′ z ∈ b D ¯ β ( z ) , ∀ S ⊆ Range ( R zi ) . (31) We denote by P R zi ( D z ) (˜ y zi ) the joint density at any ˜ y zi ∈ S . Then we have P {R zi ( D z ) ∈ S } = Z S P R zi ( D z ) (˜ y zi ) d ˜ y zi . The joint density function can be expressed by theconditional density functions: P R zi ( D z ) (˜ y zi ) = P R zi ( D z ) ,..., R Kzi ( D z ) (˜ y zi , . . . , ˜ y Kzi )= P R Kzi ( D z ) |R zi ( D z ) ,..., R K − zi ( D z ) (˜ y Kzi | ˜ y zi , . . . , ˜ y K − zi ) × . . . × P R zi ( D z ) (˜ y zi )= K Y k =1 L (˜ y kzi − Q kzi ( D z ) | K ¯ ∆ kzi ( ¯ β ) / ¯ ǫ )= K Y k =1 K ¯ ∆ kzi / ¯ ǫ exp( − | ˜ y kzi − Q kzi ( D z ) | K ¯ ∆ kzi / ¯ ǫ ) . Taking similar steps in the first part of this proof, weobtain exp( − ¯ ǫ ) ≤ P R zi ( D ′ z ) (˜ y zi ) P R zi ( ¯ D z ) (˜ y zi ) ≤ exp(¯ ǫ ) , ∀ D ′ z ∈ b D ¯ β ( z ) , and integrating ˜ y zi over S yields (31). This completesthe proof. A PPENDIX BP ROOF OF T HEOREM H ( λ ∗ ) − E [ H best ( λ k )] ≥ and the right-handside of (22) goes to as K → ∞ , (23a) holds. Also,(23b) holds due to Markov’s inequality, namely, for ǫ > , P (cid:8) H ( λ ⋆ ) − H best ( λ K ) ≥ ǫ (cid:9) ≤ E (cid:2) H ( λ ⋆ ) − H best ( λ K ) (cid:3) /ǫ, (32) where the right-hand side of (32) goes to as K → ∞ .From the right-hand side of (22), the rate of convergencein expectation is O ( G U (¯ γ ) / log( K )) . This completes theproof. A PPENDIX CP ROOF OF T HEOREM E (cid:2) || λ k +1 − λ ⋆ || | λ , . . . , λ k (cid:3) ≤ || λ k − λ ⋆ || + α k G U (¯ γ ) , where the inequality holds since α k ( H ( λ k ) − H ( λ ⋆ )) ≤ and E [ ˜ ξ kzi ] = 0 , ∀ z ∈ Z , ∀ i ∈ C ( z ) . Since λ is bounded by Assumption 2, α k G U (¯ γ ) ≥ , and G U (¯ γ ) P ∞ k =1 α k < ∞ , the sequence { λ k } generatedby Algorithm 1 with Rule 1 is a stochastic quasi-Feyer sequence for a set Λ ⋆ of maximizers. Based on Theorem6.1 in [24] and the existence of a subsequence { λ k s } such that H best ( λ k s ) converges to H ( λ ⋆ ) with probability due to (23b), one can conclude that the sequence { λ k } converges to a point in Λ ⋆ . For more details, we referthe reader to the proof of Theorem 6.2 in [24].A PPENDIX DP ROOF OF T HEOREM || λ k +1 − λ ⋆ || ≤ || λ k − λ ⋆ || − (cid:0) H ( λ ⋆ ) − H ( λ k ) (cid:1) || ˜ y k || +2 H ( λ ⋆ ) − H ( λ k ) || ˜ y k || || s k (˜ y k ) − y k || · || λ k − λ ⋆ ||≤ || λ k − λ ⋆ || − (cid:16) − || s k (˜ y k ) − y k || µ (cid:17) (cid:0) H ( λ ⋆ ) − H ( λ k ) (cid:1) || ˜ y k || ≤ || λ k − λ ⋆ || − (cid:0) G L /G U (¯ γ ) (cid:1)(cid:0) H ( λ ⋆ ) − H ( λ k ) (cid:1) , (33) where the first inequality holds since ˜ ξ k = s k (˜ y k ) − y k ,the second inequality holds due to Assumption 2, and thelast inequality holds due to (1 − || s k − ˆ y k || /µ ) ∈ (0 , from Assumption 2 and Lemma 1. By taking similarsteps in Section IV-2, we obtain ≤ H ( λ ⋆ ) − E (cid:2) max k ∈ [ K ] H ( λ k ) (cid:3) ≤ r λ U G U (¯ γ ) G L K . (34)
Taking similar steps in the proof of Theorem 2, weconclude from (34) that the sequence produced by DP-PS with Rule 2 under Assumption 2 converges in expec-tation and in probability. Also, the rate of convergencein expectation is O ( G U (¯ γ ) /K ) . It follows from (33) that || λ k +1 − λ ⋆ || ≤ || λ k − λ ⋆ || . By taking a conditionalexpectation, we obtain E (cid:2) || λ k +1 − λ ⋆ || | λ , . . . , λ k (cid:3) ≤ || λ k − λ ⋆ || . Thus, the sequence { λ k } generated by DP-PS withRule 2 under Assumption 2 is a stochastic quasi-Feyersequence for a set Λ ⋆ of maximizers. As discussed inthe proof of Theorem 3, it proves the convergence withprobability1