Privacy-Preserving Probabilistic Forecasting for Temporal-spatial Correlated Wind Farms
aa r X i v : . [ ee ss . SP ] D ec JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Privacy-Preserving Probabilistic Forecasting forTemporal-spatial Correlated Wind Farms
Mengshuo Jia,
Student Member, IEEE,
Chen Shen,
Senior Member, IEEE,
Zhiwen Wang,
Student Member, IEEE and Zhitong Yu
Abstract —Adopting Secure scalar product and Secure sumtechniques, we propose a privacy-preserving method to buildthe joint and conditional probability distribution functions ofmultiple wind farms’ output considering the temporal-spatialcorrelation. The proposed method can protect the raw dataof wind farms (WFs) from disclosure, and are mathematicallyequivalent to the centralized method which needs to gather theraw data of all WFs.
Index Terms —Wind farms, privacy, temporal-spatial correla-tion, probabilistic forecasting, secure multi-party computation.
I. I
NTRODUCTION T O consider the temporal-spatial correlation of multiplewind farms’ output (MWO) in probabilistic wind powerforecasting, one can first construct the GMM-based joint PDFof MWO at different time periods, and then directly build theconditional PDF of the output of each wind farm (WF) in thenext period with respect to the observations of MWO duringthe current periods [1].The construction of the joint and conditional PDF requirescomplete observations, each of which gathers all the corre-sponding MWO data at different time periods. Since everyWF can only observe its outputs at different time periods, thusthe complete observations are vertically partitioned amongall the WFs (vertical partitioning: the attributes are dividedacross sites and the sites must be joined to obtain completeinformation on any entity [2]). However, for protecting dataprivacy, WFs with different stakeholders may refuse to sharethose raw data to compose the complete observations forconstructing PDF. To solve this privacy issue, the privacy-preserving distributed method is a feasible alternative.For constructing the GMM-based PDF, the expectation-maximization (EM) algorithm is commonly used [3]. Nev-ertheless, for privacy-preserving distributed EM algorithm,existed researches mainly focus on horizontally partitioneddata (horizontal partitioning: each entity is represented entirelyat a single site [2]). To the best of our knowledge, rarelyhas literature addressed to deal with the vertically parti-tioned data to build GMM. Therefore, based on secure multi-party computational (SMC) method [4], this letter proposes aprivacy-preserving method to build the GMM-based joint andconditional PDF. II. N
OTATIONS
We first define domain
Ω = { , , ..., M } for M WFs,
Γ = { , , ..., T } for T periods (normally T = 24 ) and Υ = { , , ..., I } for I observations. Let y m,t denote therandom variable of the output for the m -th WF at the t -thperiod, where m ∈ Ω and t ∈ Γ . Then We aim to construct thejoint PDF of Y = { y m,t | m ∈ Ω; t ∈ Γ } . The I observationsof Y are represented by y i = { y im,t | m ∈ Ω; t ∈ Γ } ( i ∈ Υ ).To obtain a complete y i , the corresponding observations of allWFs must be gathered together.We utilize GMM to build the joint PDF. GMM is aparametric model represented by a convex combination of J multivariate Gaussian distribution functions. We define domain Λ = { , , ..., J } , then the parameter set of GMM is definedas θ = { w j , µ j , Σ j | j ∈ Λ } . The GMM-based joint PDF of Y is given as follows: f ( Y ; θ ) = J X j =1 w j N ( Y ; µ j , Σ j ) (1)where w j is the weight coefficient, and N ( Y ; µ j , Σ j ) is the j -th multivariate Gaussian distribution function with mean vector µ j and covariance matrix Σ j . The precision matrix is definedas Φ j = ( Σ j ) − . The elements of µ j are represented by µ j,m,t ( m ∈ Ω; t ∈ Γ ). The diagonal elements of Σ j or Φ j arerepresented by σ j, ( m,t ) , ( m,t ) or φ j, ( m,t ) , ( m,t ) ( m ∈ Ω; t ∈ Γ ),and the non diagnoal elements by σ j, ( m,t ) , ( n,v ) or φ j, ( m,t ) , ( n,v ) ( m, n ∈ Ω; t, v ∈ Γ ).III. C ONSTRUCTION OF T HE J OINT
PDFTo obtain the joint PDF in (1), the key lies in estimatingthe θ of GMM. We utilize the EM algorithm to fulfill theestimation. This algorithm is consist of E-step and M-step [3].For the k -th iteration of the j -th Gaussian component, the E-step is given in (2) and M-step in (3). Q i,k +1 j = w kj N ( y i ; µ kj , Σ kj ) P Jl =1 w kl N ( y i ; µ kl , Σ kl ) , i ∈ Υ (2) w k +1 j = 1 I I X i =1 Q i,k +1 j (3a) µ k +1 j = P Ii =1 Q i,k +1 j y i P Ii =1 Q i,k +1 j (3b) Σ k +1 j = P Ii =1 Q i,k +1 j ( y i − µ kj )( y i − µ kj ) ′ P Ii =1 Q i,k +1 j (3c)Both the two steps require y i ( i ∈ Υ ) for calculation.To protect data privacy, we propose a privacy-preserving OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 distributed EM (PDEM) algorithm to handle this privacy issue.The privacy preservation is defined as: the communication databetween WFs cannot divulge the raw data.
A. Private E-step
In the E-step, we assume that all WFs have acquired the θ k = { w kj , µ kj , Σ kj | j ∈ Λ } updated in the ( k -1)-th iteration.The aim of the private E-step is to make sure that every WF isable to calculate (2) without revealing raw data. The essenceof (2) lies in the calculation of the Gaussian component: N ( y i ; µ kj , Σ kj ) = exp [ − ( y i − µ kj ) Φ kj ( y i − µ kj ) ′ ] q (2 π ) M × T | Σ kj | (4)where raw data y i is only required in the exponential item g ( y i ) = ( y i − µ kj ) Φ kj ( y i − µ kj ) ′ . We further reorganize g ( y i ) into (5): g ( y i ) = M X n =1 S i,kj,n − M X n =1 T X v =1 µ kj,n,v · D i,kj,n,v (5a) S i,kj,n = T X v =1 y in,v · D i,kj,n,v , n ∈ Ω (5b) D j,kj,n,v = M X m =1 C i,kj,m − H kn,v , n ∈ Ω , v ∈ Γ (5c) C i,kj,m = T X t =1 y im,t · φ kj, ( m,t ) , ( n,v ) , m ∈ Ω (5d) H kn,v = M X m =1 T X t =1 µ kj,m,t · φ kj, ( m,t ) , ( n,v ) , n ∈ Ω , v ∈ Γ (5e)where (5d) and (5e) can be calculated by each WF. Forcalculating (5c), each WF has to gather the results of (5d)computed by other WFs. Since the results of (5d) doesn’treveal the raw data, thus these value calculated by other WFscan be shared. Thereafter, the (5b) can be obtained by eachWF. For (5a), each WF also has to gather the results of (5b) ofall WFs. Similarly, S i,kj,n in (5b) doesn’t reveal any raw data,thus this value can also be shared to each WF to calculate(5a). Then the Gaussian component in (4) is obtainable byevery WF. Finally, each WF is able to accurately completethe calculation of the E-step in (2) by the value of Gaussiancomponent in (4) without revealing any raw data. B. Private M-step
After the private E-step, each WF possesses the value of Q i,k +1 j ( j ∈ Λ; i ∈ Υ ). Therefore, every WF is able to compute(3a) directly. However, y i is required in (3b) and (3c). To avoidrevealing raw data, we further reorganize these equations byrearranging the elements of µ j and Σ j into (6) and (7), where m , n ∈ Ω and t , v ∈ Γ .Equation (6) and (7a) for all T time period are obtainableby each WF, and no any WF needs to reveal raw data. Thus,values obtained by (6) and (7a) can be shared among WFs tocompose a complete µ kj and all diagonal elements of Σ kj . For (7b), the raw data of the m -th WF at the t -th timeperiod and the n -th WF at the v -th time period are needed tocalculate a scalar product s k +1 j, ( m,t ) , ( n,v ) in (8). µ k +1 j,m,t = P Ii =1 Q i,k +1 j y im,t P Ii =1 Q i,k +1 j (6) σ k +1 j, ( m,t ) , ( m,t ) = P Ii =1 Q i,k +1 j ( y im,t − µ kj,m,t ) P Ii =1 Q i,k +1 j (7a) σ k +1 j, ( m,t ) , ( n,v ) = P Ii =1 Q i,k +1 j y im,t y in,v P Ii =1 Q i,k +1 j − µ kj,m,t µ kj,n,v (7b) s k +1 j, ( m,t ) , ( n,v ) = I X i =1 Q i,k +1 j y im,t y in,v = X k +1 j,m,t · y n,v X k +1 j,m,t = h Q ,k +1 j y m,t · · · Q i,k +1 j y im,t · · · Q I,k +1 j y Im,t i ′ y n,v = (cid:2) y n,v · · · y in,v · · · y In,v (cid:3) ′ (8)Since all WFs possess the value of Q i,k +1 j ( j = 1 , ..., J ; i =1 , ..., I ), thus knowing both X k +1 j,m,t and y n,v means knowingall the raw data. To protect the data privacy, we utilize thesecure scalar product (SSP) technique, which can securelycompute the scalar product of two vectors, to calculate (8).The calculation process of the SSP technique is summarizedas follows [4]:1) Both the m -th and n -th WF choose a same random I × I/ matrix U .2) The m -th WF generates a random I × vector R , andsend s m = U × R + X k +1 j,m,t to the n -th WF.3) The n -th WF calculates the scalar product s n, = s m · y n,v , and also calculates s n, = U ′ × y n,v . Then the n -th WF send the s n, and s n, to the m -th WF.4) The m -th WF finally calculates the scalar productthrough s k +1 j, ( m,t ) , ( n,v ) = s n, − s n, · R , and then send itto the n -th WF.Through the SSP technique, both the m -th and n -th WFcan acquire the scalar product s k +1 j, ( m,t ) , ( n,v ) without revealingany raw data. Then (7b) can be computed by the m -th and n -th WF ( m , n ∈ Ω ). Eventually, through sharing (6) and (7a),and utilizing SSP technique, every WF is able to accuratelycalculate the M-step with the protection of data privacy.IV. C ONSTRUCTION OF T HE C ONDITIONAL
PDFOur aim is to construct the conditional PDF of y m,t for thegiven current outputs of all WFs. Let v denote the index ofthe current time period, then the current outputs is representedby y v = { y m,v | m ∈ Ω } . Obviously, if t = v + 1 , theconditional PDF of y m,t can be viewed as the predictive PDFof the m -th WF’s output at the next period based on the currentoutputs of all WFs.Once the joint PDF in (1) is built via the PDEM algorithm,the conditional PDF can be constructed: f ( y m,t | y v ) = J X j =1 w cj,m,t N ( y m,t ; µ cj,m,t , Σ cj,m,t ) (9) OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 where the parameters of the conditional PDF can be specifiedvia (10): w cj,m,t = w j N ( y v ; µ j,v , Σ j,v ) P Jl =1 w l N ( y v ; µ l,v , Σ l,v ) (10a) µ cj,m,t = µ j,m,t + Σ m,tj,v ( Σ j,v ) − ( y v − µ j,v ) (10b) σ cj,m,t = σ j, ( m,t ) , ( m,t ) − Σ m,tj,v ( Σ j,v ) − ( Σ m,tj,v ) ′ (10c)where µ j,v , Σ j,v and Σ m,tj,v are given as follows: µ j,v = [ µ j, ,v · · · µ j,n,v · · · µ j,M,v ] Σ j,v = σ j, (1 ,v ) , (1 ,v ) · · · σ j, (1 ,v ) , ( M,v ) ... . . . ... σ j, ( M,v ) , (1 ,v ) · · · σ j, ( M,v ) , ( M,v ) Σ m,tj,v = (cid:2) σ j, ( m,t ) , (1 ,v ) · · · σ j, ( m,t ) , ( n,v ) · · · σ j, ( m,t ) , ( M,v ) (cid:3) Apparently, each WF can compute (10c) directly with the θ of the joint PDF. However, to calculate (10a) and (10b)needs y v , which is consist of raw data. To avoid revealingany data privacy, we further reorganize (10a) into (11) and(10b) into (12) . Note that the calculation of (10a) is similarto the calculation of (2), thus the reorganization of (10a) issimilar to that of (2). Due to limited space, we only details thecomputation parts of (10a) that have data privacy preservingproblem, which is defined as S cj,v ( v ∈ Γ ) and C cj,n,v ( n ∈ Ω , v ∈ Γ ). S cj,v = M X n =1 y n,v · D cj,n,v (11a) D cj,n,v = C cj,n,v − M X l =1 µ j,l,v φ j, ( l,v ) , ( n,v ) (11b) C cj,n,v = M X l =1 y l,v · φ j, ( l,v ) , ( n,v ) (11c) µ cj,m,t = µ j,m,t + σ j, ( m,t ) , ( m,v ) σ − j, ( m,v ) , ( m,v ) y m,v − X Mn =1 µ j,m,t σ j, ( m,t ) , ( n,v ) σ − j, ( n,v ) , ( n,v ) + X Mn =1 ,n = m σ j, ( m,t ) , ( n,v ) σ − j, ( n,v ) , ( n,v ) y n,v (12)It can be observed that raw data are involved in the weightedsum in (11a), (11c) and the last item of (12). To avoid revealingraw data, we utilize secure sum (SS) technique, which cansecurely compute the weighted sum without sacrificing dataprivacy. Take (11a) for example, the details of the SS techniqueare summarized as follows [4]:1) Assume that the sum of (11a) lies in the range [0, N).N can be set as the sum of the capacity of all the WFs.2) The 1st WF generates a random number Z , which isuniformly chosen from [0, N). Then the 1st WF send V = (cid:2) ( D cj, ,v y ,v + Z ) mod N (cid:3) to the 2nd WF.3) For the remaining WFs ( n = 2 , ..., M − ), the n -thWF sends V n = (cid:2) ( D cj,n,v y n,v + V n − ) mod N (cid:3) to the( n + 1 )-th WF. 4) When the 1st WF receives the V M − , this WF can finallycompute S cj,v = ( V M − − Z ) mod N . Then the valueof S cj,v will be shared among WFs.With the SS technique, the weighted sum in (11a), (11c) and(12) can be computed without revealing any raw data. Thenthe parameters of the conditional PDF in (10) are obtainable,so is the conditional PDF.It’s worth noting that the m -th WF doesn’t participate inthe calculation process of the last item in (12). The value ofthis item is calculated by the rest WFs, and only useful for the m -th WF. Through this design, we can ensure that each WFonly can obtain its own conditional PDF without knowing theconditional PDF of others.V. D ISCUSSION
We define the centralized method as the calculation methodwhich can gather the raw data of all the WFs for constructingPDF. Since both SSP and SS techniques can accurately andsafely calculate scalar product and weighted sum withoutany approximation, the proposed method and the centralizedmethod are mathematically equivalent, thus the constructedPDFs of the two method are exactly the same.The cost of preserving privacy is the increase of com-munication traffic. Set M = 10 , T = 24 and I = 1000 ,then in the entire calculation process of the two method, theupstream and downstream total communication traffic of a WFare given in Table I. Since communications occur in everyiteration of PDEM algorithm for every observation, thus thereis a significant increase for the communication traffic of theproposed method when compared to the centralized method.However, the total communication traffic is still very small andcan be fully satisfied under the current bandwidth conditions. TABLE IC
OMMUNICATION T RAFFIC C OMPARISON
Centralized Method Proposed MethodUpstream Traffic
Downstream Traffic . × − Mb 27.82 Mb
The entire process of the proposed method does not re-quire interaction of the WFs with raw data, thus the dataprivacy is protected. Meanwhile, the proposed method andthe centralized method are mathematically equivalent. Thecommunication traffic of the proposed method has increased,but the total traffic is still very small and can be satisfied.R
EFERENCES[1] Z. Wang, C. Shen, Y. Xu, F. Liu, X. Wu, and C. C. Liu, “Risk-limiting load restoration for resilience enhancement with intermittentenergy resources,”
IEEE Transactions on Smart Grid , pp. 1–1, 2018.[2] X. Lin, C. Clifton, and M. Zhu, “Privacy-preserving clusteringwith distributed em mixture modeling,”
Knowledge and InformationSystems , vol. 8, no. 1, pp. 68–81, Jul 2005. [Online]. Available:https://doi.org/10.1007/s10115-004-0148-7[3] R. Singh, B. C. Pal, and R. A. Jabr, “Statistical representation of distri-bution system loads using gaussian mixture model,”
IEEE Transactionson Power Systems , vol. 25, no. 1, pp. 29–37, Feb 2010.[4] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu,“Tools for privacy preserving distributed data mining,”