[PDF] Privacy-Preserving Probabilistic Forecasting for Temporal-spatial Correlated Wind Farms

Abstract

Adopting Secure scalar product and Secure sum techniques, we propose a privacy-preserving method to build the joint and conditional probability distribution functions of multiple wind farms' output considering the temporal-spatial correlation. The proposed method can protect the raw data of wind farms (WFs) from disclosure, and are mathematically equivalent to the centralized method which needs to gather the raw data of all WFs.

Full PDF

aa r X i v : . [ ee ss . SP ] D ec JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

Privacy-Preserving Probabilistic Forecasting forTemporal-spatial Correlated Wind Farms

Mengshuo Jia,

Student Member, IEEE,

Chen Shen,

Senior Member, IEEE,

Zhiwen Wang,

Student Member, IEEE and Zhitong Yu

Abstract —Adopting Secure scalar product and Secure sumtechniques, we propose a privacy-preserving method to buildthe joint and conditional probability distribution functions ofmultiple wind farms’ output considering the temporal-spatialcorrelation. The proposed method can protect the raw dataof wind farms (WFs) from disclosure, and are mathematicallyequivalent to the centralized method which needs to gather theraw data of all WFs.

Index Terms —Wind farms, privacy, temporal-spatial correla-tion, probabilistic forecasting, secure multi-party computation.

I. I

NTRODUCTION T O consider the temporal-spatial correlation of multiplewind farms’ output (MWO) in probabilistic wind powerforecasting, one can ﬁrst construct the GMM-based joint PDFof MWO at different time periods, and then directly build theconditional PDF of the output of each wind farm (WF) in thenext period with respect to the observations of MWO duringthe current periods [1].The construction of the joint and conditional PDF requirescomplete observations, each of which gathers all the corre-sponding MWO data at different time periods. Since everyWF can only observe its outputs at different time periods, thusthe complete observations are vertically partitioned amongall the WFs (vertical partitioning: the attributes are dividedacross sites and the sites must be joined to obtain completeinformation on any entity [2]). However, for protecting dataprivacy, WFs with different stakeholders may refuse to sharethose raw data to compose the complete observations forconstructing PDF. To solve this privacy issue, the privacy-preserving distributed method is a feasible alternative.For constructing the GMM-based PDF, the expectation-maximization (EM) algorithm is commonly used [3]. Nev-ertheless, for privacy-preserving distributed EM algorithm,existed researches mainly focus on horizontally partitioneddata (horizontal partitioning: each entity is represented entirelyat a single site [2]). To the best of our knowledge, rarelyhas literature addressed to deal with the vertically parti-tioned data to build GMM. Therefore, based on secure multi-party computational (SMC) method [4], this letter proposes aprivacy-preserving method to build the GMM-based joint andconditional PDF. II. N

OTATIONS

We ﬁrst deﬁne domain

Ω = { , , ..., M } for M WFs,

Γ = { , , ..., T } for T periods (normally T = 24 ) and Υ = { , , ..., I } for I observations. Let y m,t denote therandom variable of the output for the m -th WF at the t -thperiod, where m ∈ Ω and t ∈ Γ . Then We aim to construct thejoint PDF of Y = { y m,t | m ∈ Ω; t ∈ Γ } . The I observationsof Y are represented by y i = { y im,t | m ∈ Ω; t ∈ Γ } ( i ∈ Υ ).To obtain a complete y i , the corresponding observations of allWFs must be gathered together.We utilize GMM to build the joint PDF. GMM is aparametric model represented by a convex combination of J multivariate Gaussian distribution functions. We deﬁne domain Λ = { , , ..., J } , then the parameter set of GMM is deﬁnedas θ = { w j , µ j , Σ j | j ∈ Λ } . The GMM-based joint PDF of Y is given as follows: f ( Y ; θ ) = J X j =1 w j N ( Y ; µ j , Σ j ) (1)where w j is the weight coefﬁcient, and N ( Y ; µ j , Σ j ) is the j -th multivariate Gaussian distribution function with mean vector µ j and covariance matrix Σ j . The precision matrix is deﬁnedas Φ j = ( Σ j ) − . The elements of µ j are represented by µ j,m,t ( m ∈ Ω; t ∈ Γ ). The diagonal elements of Σ j or Φ j arerepresented by σ j, ( m,t ) , ( m,t ) or φ j, ( m,t ) , ( m,t ) ( m ∈ Ω; t ∈ Γ ),and the non diagnoal elements by σ j, ( m,t ) , ( n,v ) or φ j, ( m,t ) , ( n,v ) ( m, n ∈ Ω; t, v ∈ Γ ).III. C ONSTRUCTION OF T HE J OINT

PDFTo obtain the joint PDF in (1), the key lies in estimatingthe θ of GMM. We utilize the EM algorithm to fulﬁll theestimation. This algorithm is consist of E-step and M-step [3].For the k -th iteration of the j -th Gaussian component, the E-step is given in (2) and M-step in (3). Q i,k +1 j = w kj N ( y i ; µ kj , Σ kj ) P Jl =1 w kl N ( y i ; µ kl , Σ kl ) , i ∈ Υ (2) w k +1 j = 1 I I X i =1 Q i,k +1 j (3a) µ k +1 j = P Ii =1 Q i,k +1 j y i P Ii =1 Q i,k +1 j (3b) Σ k +1 j = P Ii =1 Q i,k +1 j ( y i − µ kj )( y i − µ kj ) ′ P Ii =1 Q i,k +1 j (3c)Both the two steps require y i ( i ∈ Υ ) for calculation.To protect data privacy, we propose a privacy-preserving OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 distributed EM (PDEM) algorithm to handle this privacy issue.The privacy preservation is deﬁned as: the communication databetween WFs cannot divulge the raw data.

A. Private E-step

In the E-step, we assume that all WFs have acquired the θ k = { w kj , µ kj , Σ kj | j ∈ Λ } updated in the ( k -1)-th iteration.The aim of the private E-step is to make sure that every WF isable to calculate (2) without revealing raw data. The essenceof (2) lies in the calculation of the Gaussian component: N ( y i ; µ kj , Σ kj ) = exp [ − ( y i − µ kj ) Φ kj ( y i − µ kj ) ′ ] q (2 π ) M × T | Σ kj | (4)where raw data y i is only required in the exponential item g ( y i ) = ( y i − µ kj ) Φ kj ( y i − µ kj ) ′ . We further reorganize g ( y i ) into (5): g ( y i ) = M X n =1 S i,kj,n − M X n =1 T X v =1 µ kj,n,v · D i,kj,n,v (5a) S i,kj,n = T X v =1 y in,v · D i,kj,n,v , n ∈ Ω (5b) D j,kj,n,v = M X m =1 C i,kj,m − H kn,v , n ∈ Ω , v ∈ Γ (5c) C i,kj,m = T X t =1 y im,t · φ kj, ( m,t ) , ( n,v ) , m ∈ Ω (5d) H kn,v = M X m =1 T X t =1 µ kj,m,t · φ kj, ( m,t ) , ( n,v ) , n ∈ Ω , v ∈ Γ (5e)where (5d) and (5e) can be calculated by each WF. Forcalculating (5c), each WF has to gather the results of (5d)computed by other WFs. Since the results of (5d) doesn’treveal the raw data, thus these value calculated by other WFscan be shared. Thereafter, the (5b) can be obtained by eachWF. For (5a), each WF also has to gather the results of (5b) ofall WFs. Similarly, S i,kj,n in (5b) doesn’t reveal any raw data,thus this value can also be shared to each WF to calculate(5a). Then the Gaussian component in (4) is obtainable byevery WF. Finally, each WF is able to accurately completethe calculation of the E-step in (2) by the value of Gaussiancomponent in (4) without revealing any raw data. B. Private M-step

After the private E-step, each WF possesses the value of Q i,k +1 j ( j ∈ Λ; i ∈ Υ ). Therefore, every WF is able to compute(3a) directly. However, y i is required in (3b) and (3c). To avoidrevealing raw data, we further reorganize these equations byrearranging the elements of µ j and Σ j into (6) and (7), where m , n ∈ Ω and t , v ∈ Γ .Equation (6) and (7a) for all T time period are obtainableby each WF, and no any WF needs to reveal raw data. Thus,values obtained by (6) and (7a) can be shared among WFs tocompose a complete µ kj and all diagonal elements of Σ kj . For (7b), the raw data of the m -th WF at the t -th timeperiod and the n -th WF at the v -th time period are needed tocalculate a scalar product s k +1 j, ( m,t ) , ( n,v ) in (8). µ k +1 j,m,t = P Ii =1 Q i,k +1 j y im,t P Ii =1 Q i,k +1 j (6) σ k +1 j, ( m,t ) , ( m,t ) = P Ii =1 Q i,k +1 j ( y im,t − µ kj,m,t ) P Ii =1 Q i,k +1 j (7a) σ k +1 j, ( m,t ) , ( n,v ) = P Ii =1 Q i,k +1 j y im,t y in,v P Ii =1 Q i,k +1 j − µ kj,m,t µ kj,n,v (7b) s k +1 j, ( m,t ) , ( n,v ) = I X i =1 Q i,k +1 j y im,t y in,v = X k +1 j,m,t · y n,v X k +1 j,m,t = h Q ,k +1 j y m,t · · · Q i,k +1 j y im,t · · · Q I,k +1 j y Im,t i ′ y n,v = (cid:2) y n,v · · · y in,v · · · y In,v (cid:3) ′ (8)Since all WFs possess the value of Q i,k +1 j ( j = 1 , ..., J ; i =1 , ..., I ), thus knowing both X k +1 j,m,t and y n,v means knowingall the raw data. To protect the data privacy, we utilize thesecure scalar product (SSP) technique, which can securelycompute the scalar product of two vectors, to calculate (8).The calculation process of the SSP technique is summarizedas follows [4]:1) Both the m -th and n -th WF choose a same random I × I/ matrix U .2) The m -th WF generates a random I × vector R , andsend s m = U × R + X k +1 j,m,t to the n -th WF.3) The n -th WF calculates the scalar product s n, = s m · y n,v , and also calculates s n, = U ′ × y n,v . Then the n -th WF send the s n, and s n, to the m -th WF.4) The m -th WF ﬁnally calculates the scalar productthrough s k +1 j, ( m,t ) , ( n,v ) = s n, − s n, · R , and then send itto the n -th WF.Through the SSP technique, both the m -th and n -th WFcan acquire the scalar product s k +1 j, ( m,t ) , ( n,v ) without revealingany raw data. Then (7b) can be computed by the m -th and n -th WF ( m , n ∈ Ω ). Eventually, through sharing (6) and (7a),and utilizing SSP technique, every WF is able to accuratelycalculate the M-step with the protection of data privacy.IV. C ONSTRUCTION OF T HE C ONDITIONAL

PDFOur aim is to construct the conditional PDF of y m,t for thegiven current outputs of all WFs. Let v denote the index ofthe current time period, then the current outputs is representedby y v = { y m,v | m ∈ Ω } . Obviously, if t = v + 1 , theconditional PDF of y m,t can be viewed as the predictive PDFof the m -th WF’s output at the next period based on the currentoutputs of all WFs.Once the joint PDF in (1) is built via the PDEM algorithm,the conditional PDF can be constructed: f ( y m,t | y v ) = J X j =1 w cj,m,t N ( y m,t ; µ cj,m,t , Σ cj,m,t ) (9) OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 where the parameters of the conditional PDF can be speciﬁedvia (10): w cj,m,t = w j N ( y v ; µ j,v , Σ j,v ) P Jl =1 w l N ( y v ; µ l,v , Σ l,v ) (10a) µ cj,m,t = µ j,m,t + Σ m,tj,v ( Σ j,v ) − ( y v − µ j,v ) (10b) σ cj,m,t = σ j, ( m,t ) , ( m,t ) − Σ m,tj,v ( Σ j,v ) − ( Σ m,tj,v ) ′ (10c)where µ j,v , Σ j,v and Σ m,tj,v are given as follows: µ j,v = [ µ j, ,v · · · µ j,n,v · · · µ j,M,v ] Σ j,v =  σ j, (1 ,v ) , (1 ,v ) · · · σ j, (1 ,v ) , ( M,v ) ... . . . ... σ j, ( M,v ) , (1 ,v ) · · · σ j, ( M,v ) , ( M,v )  Σ m,tj,v = (cid:2) σ j, ( m,t ) , (1 ,v ) · · · σ j, ( m,t ) , ( n,v ) · · · σ j, ( m,t ) , ( M,v ) (cid:3) Apparently, each WF can compute (10c) directly with the θ of the joint PDF. However, to calculate (10a) and (10b)needs y v , which is consist of raw data. To avoid revealingany data privacy, we further reorganize (10a) into (11) and(10b) into (12) . Note that the calculation of (10a) is similarto the calculation of (2), thus the reorganization of (10a) issimilar to that of (2). Due to limited space, we only details thecomputation parts of (10a) that have data privacy preservingproblem, which is deﬁned as S cj,v ( v ∈ Γ ) and C cj,n,v ( n ∈ Ω , v ∈ Γ ). S cj,v = M X n =1 y n,v · D cj,n,v (11a) D cj,n,v = C cj,n,v − M X l =1 µ j,l,v φ j, ( l,v ) , ( n,v ) (11b) C cj,n,v = M X l =1 y l,v · φ j, ( l,v ) , ( n,v ) (11c) µ cj,m,t = µ j,m,t + σ j, ( m,t ) , ( m,v ) σ − j, ( m,v ) , ( m,v ) y m,v − X Mn =1 µ j,m,t σ j, ( m,t ) , ( n,v ) σ − j, ( n,v ) , ( n,v ) + X Mn =1 ,n = m σ j, ( m,t ) , ( n,v ) σ − j, ( n,v ) , ( n,v ) y n,v (12)It can be observed that raw data are involved in the weightedsum in (11a), (11c) and the last item of (12). To avoid revealingraw data, we utilize secure sum (SS) technique, which cansecurely compute the weighted sum without sacriﬁcing dataprivacy. Take (11a) for example, the details of the SS techniqueare summarized as follows [4]:1) Assume that the sum of (11a) lies in the range [0, N).N can be set as the sum of the capacity of all the WFs.2) The 1st WF generates a random number Z , which isuniformly chosen from [0, N). Then the 1st WF send V = (cid:2) ( D cj, ,v y ,v + Z ) mod N (cid:3) to the 2nd WF.3) For the remaining WFs ( n = 2 , ..., M − ), the n -thWF sends V n = (cid:2) ( D cj,n,v y n,v + V n − ) mod N (cid:3) to the( n + 1 )-th WF. 4) When the 1st WF receives the V M − , this WF can ﬁnallycompute S cj,v = ( V M − − Z ) mod N . Then the valueof S cj,v will be shared among WFs.With the SS technique, the weighted sum in (11a), (11c) and(12) can be computed without revealing any raw data. Thenthe parameters of the conditional PDF in (10) are obtainable,so is the conditional PDF.It’s worth noting that the m -th WF doesn’t participate inthe calculation process of the last item in (12). The value ofthis item is calculated by the rest WFs, and only useful for the m -th WF. Through this design, we can ensure that each WFonly can obtain its own conditional PDF without knowing theconditional PDF of others.V. D ISCUSSION

We deﬁne the centralized method as the calculation methodwhich can gather the raw data of all the WFs for constructingPDF. Since both SSP and SS techniques can accurately andsafely calculate scalar product and weighted sum withoutany approximation, the proposed method and the centralizedmethod are mathematically equivalent, thus the constructedPDFs of the two method are exactly the same.The cost of preserving privacy is the increase of com-munication trafﬁc. Set M = 10 , T = 24 and I = 1000 ,then in the entire calculation process of the two method, theupstream and downstream total communication trafﬁc of a WFare given in Table I. Since communications occur in everyiteration of PDEM algorithm for every observation, thus thereis a signiﬁcant increase for the communication trafﬁc of theproposed method when compared to the centralized method.However, the total communication trafﬁc is still very small andcan be fully satisﬁed under the current bandwidth conditions. TABLE IC

OMMUNICATION T RAFFIC C OMPARISON

Centralized Method Proposed MethodUpstream Trafﬁc

Downstream Trafﬁc . × − Mb 27.82 Mb

The entire process of the proposed method does not re-quire interaction of the WFs with raw data, thus the dataprivacy is protected. Meanwhile, the proposed method andthe centralized method are mathematically equivalent. Thecommunication trafﬁc of the proposed method has increased,but the total trafﬁc is still very small and can be satisﬁed.R

EFERENCES[1] Z. Wang, C. Shen, Y. Xu, F. Liu, X. Wu, and C. C. Liu, “Risk-limiting load restoration for resilience enhancement with intermittentenergy resources,”

IEEE Transactions on Smart Grid , pp. 1–1, 2018.[2] X. Lin, C. Clifton, and M. Zhu, “Privacy-preserving clusteringwith distributed em mixture modeling,”

Knowledge and InformationSystems , vol. 8, no. 1, pp. 68–81, Jul 2005. [Online]. Available:https://doi.org/10.1007/s10115-004-0148-7[3] R. Singh, B. C. Pal, and R. A. Jabr, “Statistical representation of distri-bution system loads using gaussian mixture model,”

IEEE Transactionson Power Systems , vol. 25, no. 1, pp. 29–37, Feb 2010.[4] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu,“Tools for privacy preserving distributed data mining,”