[PDF] Robust Recovery of Missing Data in Electricity Distribution Systems

Abstract

The advanced operation of future electricity distribution systems is likely to require significant observability of the different parameters of interest (e.g., demand, voltages, currents, etc.). Ensuring completeness of data is, therefore, paramount. In this context, an algorithm for recovering missing state variable observations in electricity distribution systems is presented. The proposed method exploits the low rank structure of the state variables via a matrix completion approach while incorporating prior knowledge in the form of second order statistics. Specifically, the recovery method combines nuclear norm minimization with Bayesian estimation. The performance of the new algorithm is compared to the information-theoretic limits and tested trough simulations using real data of an urban low voltage distribution system. The impact of the prior knowledge is analyzed when a mismatched covariance is used and for a Markovian sampling that introduces structure in the observation pattern. Numerical results demonstrate that the proposed algorithm is robust and outperforms existing state of the art algorithms.

Full PDF

11 Robust Recovery of Missing Data in ElectricityDistribution Systems

Cristian Genes, I˜naki Esnaola, Samir M. Perlaza, Luis F. Ochoa, and Daniel Coca.

Abstract —The advanced operation of future electricity distri-bution systems is likely to require signiﬁcant observability of thedifferent parameters of interest (e.g., demand, voltages, currents,etc.). Ensuring completeness of data is, therefore, paramount. Inthis context, an algorithm for recovering missing state variableobservations in electricity distribution systems is presented. Theproposed method exploits the low rank structure of the statevariables via a matrix completion approach while incorporatingprior knowledge in the form of second order statistics. Speciﬁ-cally, the recovery method combines nuclear norm minimizationwith Bayesian estimation. The performance of the new algorithmis compared to the information-theoretic limits and tested troughsimulations using real data of an urban low voltage distributionsystem. The impact of the prior knowledge is analyzed when amismatched covariance is used and for a Markovian samplingthat introduces structure in the observation pattern. Numericalresults demonstrate that the proposed algorithm is robust andoutperforms existing state of the art algorithms.

Index Terms —recovery of missing data, distribution systems,matrix completion, Bayesian estimation

I. I

NTRODUCTION T HE wide-spread adoption of residential scale low car-bon technologies, from PV systems to electric vehicles,will undoubtedly bring technical challenges to the electricitydistribution systems as they have been designed for passiveloads [1] and [2]. Therefore, and as part of the Smart Gridvision, electricity distribution systems, including low voltagecircuits, are likely to adopt a more active role so as to cost-effectively manage controllable network elements as well asparticipants [3]. As a result, monitoring and control proceduresare expected to face increasingly demanding performance re-quirements posed by the dynamic and unknown scenarios thatthe smart grid gives rise to. Advanced control strategies requiretimely and accurate data describing the state of the grid. Inthis setting, the sensing infrastructure is expected to providecomplete and reliable state information of the distributionsystem. However, in practical scenarios, the operator faceschallenges like data injection attacks [4], [5] or missing data

Cristian Genes and Daniel Coca are with the Department of AutomaticControl and Systems Engineering, University of Shefﬁeld, Shefﬁeld S1 3JD,UK.I˜naki Esnaola is with the Department of Automatic Control and SystemsEngineering, University of Shefﬁeld, Shefﬁeld S1 3JD, UK, and also withthe Department of Electrical Engineering, Princeton University, Princeton NJ08540, USA.Samir M. Perlaza is with the Institut National de Recherche en Informatiqueet Automatique (INRIA), Lyon, France, and also with the Department ofElectrical Engineering, Princeton University, Princeton NJ 08540, USA.Luis F. Ochoa is with the Department of Electrical and Electronic En-gineering, The University of Melbourne, Melbourne 3010, Australia, andalso with the School of Electrical and Electronic Engineering, The Univer-sity of Manchester, Manchester M13 9PL, UK. (cgenes1@shefﬁeld.ac.uk,esnaola@shefﬁeld.ac.uk, [email protected], luis [email protected], andd.coca@shefﬁeld.ac.uk). [6], [7]. Sensor failures, unreliable communication or datastorage issues are some of the causes for incomplete sets ofobservations. As a consequence, the state of the grid is notperfectly known and control mechanisms are difﬁcult to im-plement. For instance, accurate measurements are necessary toimplement a centralized control scheme for voltage regulationin distribution systems [8]. In view of this, it is vital to developestimation procedures for the missing data using the availableobservations.Missing data recovery can be cast as a minimum meansquare error (MMSE) estimation problem when a probabilisticdescription of the underlying process governing the statevariables is available. However, the MMSE estimation relieson accurate second order statistics which is an unrealisticassumption in practical scenarios [7], [9]. The increased num-ber of nonlinear loads and the turbulent nature of distributedgeneration options in the locally controlled grid affects theprecision of the postulated statistics for the state variables.For that reason, the efﬁciency of MMSE estimation is limitedin the smart grid context [7].Matrix completion (MC) was recently proposed to recovermissing data from partial observations [10]. The main advan-tage is that the recovery via MC requires mild assumptionsabout the setting, e.g. access to second order statistics is notrequired. Instead, matrix completion-based recovery exploitsthe fact that correlated state variable vectors give rise toapproximately low rank data matrices. That being the case, therecovery of the missing entries of low rank matrices is feasiblein a convex optimization context provided that a sufﬁcientfraction of the entries is observed [10], [11], and [12]. Thekey theoretical results therein are based on the assumption thatthe locations of sampled entries are uniformly distributed. Inpractice, however, this assumption is not always satisﬁed. Forinstance, in electricity distribution systems missing data entriestend to display signiﬁcant structure across both space andtime [7]. The applicability of MC recovery for non-uniformsampling is studied in [6], [13]. Not surprisingly, low rankminimization tools are also used to address the problem ofelectricity price forecasting [14] and to develop a frameworkfor efﬁcient processing of synchrophasor data [15].This paper compares the performance of different miss-ing data recovery strategies with respect to the information-theoretic limit and introduces a novel algorithm that addressesthe main shortcomings of existing techniques. The perfor-mance of the new algorithm is tested against singular valuethresholding (SVT) recovery [16] and MMSE estimation underrealistic assumptions, i.e., the postulated statistics are notaccurate and the sampling pattern is not uniform. In particular,a mismatched covariance matrix model is used in [17] and a r X i v : . [ c s . S Y ] A ug [18] for the case in which imperfect second order statisticsare available. Non-uniform sampling is considered to accountfor structure on the observation pattern [15]. Numerical resultsshow a signiﬁcant gain in performance for both cases whencompared to SVT recovery. Remarkably, the new algorithm isrobust to mismatched statistics and to non-uniform samplingpatterns. II. S YSTEM MODEL

Consider a low voltage distribution system with L feed-ers. Each feeder includes a sensing unit that measures theelectrical magnitudes of operational interest at predeterminedtime instants. These measurements that include phase activepower, phase reactive power and phase voltage support theoperator in controlling, monitoring, and managing the network.In practice, the acquisition process provides the operator with anoisy and incomplete set of state variables. For that reason, theoperator needs to recover the missing data using the availableobservations. A. Source Model for State Variables

For a given electrical magnitude s , let m ( s ) i,j be the value ofthat particular electrical magnitude at feeder i at time j . Thematrix of state variables for s , denoted by M ( s ) ∈ R N × L ,contains the aggregated state variable vectors from all feeders,i.e. M ( s ) ∆ = [ m ( s )1 , m ( s )2 , ..., m ( s ) L ] . The state variable vectorfor s , in feeder i , contains the N state variables generated inthe feeder and is given by m ( s ) i = [ m ( s ) i, , m ( s ) i, , ..., m ( s ) i,N ] T ∈ R N . Without loss of generality the analysis is carried out fora particular electrical magnitude, and therefore, the index s is dropped. The resulting data matrix M contains the statevariable of interest at time instants , , ..., N for all L feeders.Real data is used to model the statistical structure of the datagenerated in an electricity distribution system. The real dataset under consideration contains values from 200 residentialsecondary substations across the North West of England col-lected from June 2013 to January 2014. The data collection ispart of the “Low Voltage Network Solutions” project run byElectricity North West Limited [19]. Each substation creates adaily ﬁle containing values of voltage, current and power levelsfor all three phases. An analysis of the distribution and samplecovariance matrix of the voltage data set under considerationis presented in [7]. Therein it is shown that state variables canbe modelled as a multivariate Gaussian random process m i ∼N ( µ , Σ ) , (1)and { m i } Li =1 is an independent and identically distributed se-quence of random variables. Consequently, M is a realizationof the random process describing the value of the state variableof interest across the grid.Fig. 1 describes the distribution system monitoring model.In this setting, the electrical magnitudes describing the stateof the system are modelled as a random process that outputsa realization M ∈ R N × L every N time instants. The stateof the grid is fully described by the entries of the matrix M . However, the operator observes a subset of the completeset of state variables, i.e. measurements are lost during theacquisition process. The aim of the estimation process is torecover the missing entries. Low Voltage Distribution

System Missing

Data

RecoveryObservation P ⌦ NM R P ⌦ ( R ) c M Figure 1. Block diagram describing the system model.

B. Acquisition

The sensing infrastructure introduces additive white Gaus-sian noise (AWGN) as a result of the thermal noise present ateach sensor. The resulting measurements are given by R = M + N , (2)where ( N ) i,j ∼ N (0 , σ ) , (3)for i ∈ { , , ..., N } and j ∈ { , , ..., L } . Moreover, itis also assumed that only a fraction of the complete setof measurements (entries in R ) are communicated to theoperator. Denote by Ω the subset of observed entries, i.e., Ω ⊆ { , , ...N } × { , ., ..., L } . By deﬁnition it follows that Ω is given by Ω ∆ = { ( i, j ) : ( R ) i,j is observed } . (4)Formally, the acquisition process is modelled by the function f : R N × L → R | Ω | with f ( M ) = P Ω ( R ) where P Ω ( R ) = ( R ) Ω , (5)and | Ω | denotes the cardinality of Ω . The observations givenby (5) describe all the data that is available to the operator forestimation purposes and therefore, the recovery of the missingdata is performed from the observations P Ω ( R ) . C. Estimation

The estimation process of the complete matrix of statevariables based on the available observations is modelled bythe function g : R | Ω | → R N × L . The estimate (cid:99) M = g ( f ( M )) is obtained by solving an optimization problem based on anoptimality criterion. In this paper, the optimality criterion isthe mean square error (MSE) given byMSE ( M ; g ) = E (cid:2) (cid:107) M − g ( f ( M )) (cid:107) F (cid:3) N L , (6)where (cid:107)·(cid:107) F denotes the Frobenius norm. The normalized meansquare error (NMSE) is deﬁned asNMSE ( M ; g ) = MSE ( M ; g ) N L (cid:107) M (cid:107) F . (7)For this optimality criterion, the optimal estimate of themissing data is given by the MMSE estimate (cid:99) M MMSE = E [ M | f ( M ) , Σ ] , (8)where Σ ∈ R N × N is the covariance matrix deﬁned in (1).Note that, in general, obtaining the optimal estimate (cid:99) M MMSE requires knowledge of the probability distribution describingthe state variables. If the state variables follow a Gaussiandistribution it boils down to knowledge of the second ordermoments, i.e. the covariance matrix Σ which needs to beknown prior to the estimation process. In practice, the operator relies on postulated statistics that typically do not match theactual statistics. Consequently, the accuracy of the estimate isa function of the difference between the real and the postulatedstatistics. III. I NFORMATION -T HEORETIC L IMIT

In order to assess the performance of the missing datarecovery techniques in absolute terms, this section introducesthe optimal performance theoretically attainable (OPTA) byan estimator g when the state variables follow a multivariateGaussian distribution. For a given number of observations, theminimum distortion achievable by any estimation method isdetermined by the rate-distortion function [20]. In the elec-tricity distribution setting described above, the observationsentries are corrupted by additive white Gaussian noise whichdetermines the ﬁnite rate at which information about the statevariables is obtained from the observations. Consequently, theoptimal performance is bounded by the capacity of the AWGNchannel R ( D ) < C, (9)where R is the rate at which the source needs to be observedto achieve a distortion D , and C is the capacity of the parallelAWGN channels modelling the observation process. In viewof this, the OPTA for a multivariate Gaussian source is givenby R ( D ) ≤ | Ω | N L log (1 + snr ) , (10)where the signal to noise ratio, denoted by snr , is deﬁned as snr ∆ = N Tr ( Σ ) σ , (11)where σ is deﬁned in (3). The rate-distortion function of amultivariate Gaussian process is computed using the followingparametric equations [21] (cid:40) R ( θ ) = N (cid:80) N − i =0 max (0 , log λ i θ ) D ( θ ) = N (cid:80) N − i =0 min ( θ, λ i ) , (12)where R is the source rate in nats/symbol, D is the meansquare error distortion per entry, λ i is the i th largest eigen-value of Σ , and θ is a parameter. The NMSE theoreticallyattainable, NMSE ( M ; OPTA ) , follows from combining (7) and(12) and is determined byNMSE ( M ; OPTA ) =

D N L (cid:107) M (cid:107) F . (13)IV. R ECOVERY OF MISSING DATA

In this section, the information-theoretic limit for missingdata recovery presented in Section III, is compared withMMSE estimation and the singular value thresholding (SVT)recovery.

A. Minimum Mean Squared Error Estimation

Linear MMSE (LMMSE) estimation achieves the optimalperformance in the recovery of missing data for a given set ofobservations Ω when the data is generated by a multivariateGaussian source and the optimality criteria is the MSE.However, this estimation procedure relies on access to secondorder statistics of the state variables. In particular, each columnof the matrix P Ω ( R ) is given by P Ω ( r i ) = A i ( m i + n i ) , (14)where A i is deﬁned such that A i m i = P Ω ( m i ) and i ∈{ , , ..., L } . Consequently, the LMMSE estimate for eachstate variable vector is given by (cid:98) m i = µ + Γ i ( P Ω ( r i ) − A i µ ) , (15)where µ is deﬁned in (1) and Γ i = ΣA Ti ( A i ΣA Ti + σ I ) − , (16) i ∈ { , , ..., L } . The normalized error achieved by theLMMSE estimator is given byNMSE ( M ; LMMSE ) = (cid:107) M − (cid:99) M LMMSE (cid:107) F (cid:107) M (cid:107) F , (17)where (cid:99) M LMMSE = [ (cid:98) m , (cid:98) m , ..., (cid:98) m L ] , with (cid:98) m i deﬁned in (15). B. Singular Value Thresholding

Low rank matrices are recovered from a subset of theentries via rank minimization techniques under mild coherenceconditions on the set of observations [10]. Speciﬁcally, themissing entries are recovered by solving the following rankminimization problem:minimize X rank( X ) subject to P Ω ( X ) = P Ω ( M ) . (18)Unfortunately, this rank minimization problem is NP-hard.Favorably, in [10] it is shown that when the entries on Ω are sampled uniformly at random, the solution of the rankminimization problem in (18) is obtained with high probabilityby solving the nuclear norm minimization problem in (20).SVT is a matrix completion algorithm [16] which producesa sequence of matrices X k that converges to the uniquesolution of the following optimization problem:minimize X τ (cid:107) X (cid:107) ∗ + 12 (cid:107) X (cid:107) F subject to P Ω ( X ) = P Ω ( M ) , (19)where (cid:107) X (cid:107) ∗ is the nuclear norm of the matrix X . Note thatwhen τ → ∞ , the optimization problem in (19) converges tothe nuclear norm minimization problem proposed in [10]minimize X (cid:107) X (cid:107) ∗ subject to P Ω ( X ) = P Ω ( M ) . (20)For large values of τ , SVT provides the solution to thenuclear norm minimization problem. Compared to alternativeslike SeDuMi [22] or SDPT3 [23], SVT features a lower computational cost per iteration. This is achieved by exploitingthe sparsity of Y k and the low-rank property of X k to reducestorage requirements. The low computational cost results inthe possibility of using larger matrices. Simulation results in[16] show that SVT recovers matrices with nearly a billionentries. In comparison, SeDuMi and SDPT3 produce accuratesolutions for squared matrices with dimension close to ﬁfty.In [24] the structure of the problem is exploited to reducethe memory requirements and increase the matrix size up to350. Because of the dimension of the data sets produced bylow voltage distribution systems, the remaining of the paperfocuses on the SVT as a benchmark MC-based recovery. Themain idea in SVT consists in the following iteration steps: (cid:40) X k = D τ ( Y k − ) , Y k = Y k − + δ s P Ω ( M − X k ) , (21)where Y = , δ s is the step size that obeys < δ s < , andthe soft-thresholding operator, D τ , applies a soft-thresholdingrule to the singular values of Y k − , shrinking these towardszero. For a matrix Y ∈ R N × L of rank r with singular valuedecomposition given by Y = USV T , S = diag ( { σ i ( Y ) } ≤ i ≤ r ) , (22)where U and V are unitary matrices of size N × r and L × r ,respectively, and σ i ( Y ) are the singular values of the matrix Y , the soft-thresholding operator is deﬁned as D τ ( Y ) ∆ = U D τ ( S ) V T , with D τ ( S ) = diag ( { ( σ i ( Y ) − τ ) + } ) , (23)where t + = max (0 , t ) . Interestingly, the choice of τ isimportant to guarantee a successful recovery, since large valuesguarantee a low-rank matrix estimate but for values largerthan max i ( σ i ( Y )) all the singular values vanish. In [16],the proposed threshold is τ = 5 N . However, simulationresults presented in [7] show that τ = 5 N gives suboptimalperformance when the number of missing entries is large.Unfortunately, ﬁnding the optimal threshold when the matrixis sparse is still an open problem. In general, the value ofthe threshold for soft-thresholding based recovery algorithmsis obtained via numerical optimization in [7] and [12]. Thesame soft-thresholding operator, D τ , is used in a differentframework for denoising [12], [25], and [26]. In this context,the performance of the denoiser, measured in MSE, is esti-mated using Stein's unbiased risk estimate (SURE) [27]. In[28] a closed-form expression for the unbiased risk estimateis presented for the operator D τ . C. Performance Evaluation with Real Data

This subsection presents a comparison between LMMSEand SVT, and the theoretical limit, OPTA, using real electricitydistribution system data. The test matrix, M , is a squarematrix of size , i.e. N = L = 500 , and contains voltagemeasurements covering the state of the grid for a period of 2hours. Each column represents a different state variable vectorthat describes the grid on a different day and for a different -8 -6 -4 -2 Figure 2. Real data recovery performance using SVT, LMMSE estimation,for different levels of mismatch, and the OPTA, when SNR = 20 dB. feeder. The entries in Ω are sampled uniformly at random withprobability P [( i, j ) ∈ Ω] = 1

N L E [ | Ω | ] , (24)and the performance of the SVT-based recovery is deﬁned interms of the NMSE given byNMSE ( M ; SVT ) = (cid:107) M − (cid:99) M SVT (cid:107) F (cid:107) M (cid:107) F , (25)where (cid:99) M SVT is the SVT estimate of M based on P Ω ( R ) . Let γ be the expected value of the ratio of missing entries for thematrix M , that is: γ ∆ = 1 − N L E [ | Ω | ] . (26)Since the performance of the LMMSE estimator depends onthe covariance matrix Σ , a mismatched covariance matrixmodel is introduced to account for the difference betweenthe postulated and actual statistics. Speciﬁcally, the postulatedcovariance matrix is given by Σ ∗ = Σ + 1 SMR (cid:107) Σ (cid:107) F (cid:107) ∆ (cid:107) F ∆ , (27)where Σ is the actual covariance matrix, ∆ = HH T with H ∈ R N × N , the entries of H are distributed as N (0 , .The strength of the mismatch is determined by the signalto mismatch ratio (SMR), which is deﬁned such that forSMR = 1 the norm of the mismatch is equal to the normof the real covariance matrix, i.e., (cid:107) Σ (cid:107) F = (cid:107) α ∆ (cid:107) F .Fig. 2 shows the performance, measured in NMSE, forthe SVT-based recovery compared to the performance of theLMMSE estimator when different levels of mismatch areintroduced and to the theoretical limit given by the OPTA.Numerical results in this section are obtained for a signal tonoise ratio value of SNR = 20 dB, where SNR ∆ = 10 log snr . It can be seen that the performance of the SVT algorithmis closer to the theoretical limit when the number of missing entries is large. Interestingly, the LMMSE estimator gives bet-ter performance when SMR ≥ . However, when SMR=10and γ ≤ . the SVT algorithm outperforms the LMMSEestimator. Moreover, the SVT provides a better recovery forSMR=1 for almost all values of γ . In view of this, the LMMSEestimation requires accurate second order statistics to performcompetitively in this setting which is an unrealistic assumptionin a practical scenario. Moreover, the performance of the SVTalgorithm depends of the threshold τ [7] which is difﬁcult tooptimize for this case.V. M AIN R ESULT

This section introduces a novel algorithm for missing datarecovery that incorporates imperfect second order informationstatistics. The new approach is based on the SVT algorithmbut it exploits the information about the second order statisticsto optimize the threshold τ at each iteration k . A. Soft-thresholding parameter

The main shortcoming of the SVT algorithm is the lackof guidelines for tuning the threshold τ . Numerical results in[7] show that the value N proposed in [16] is not optimalfor every scenario. In order to provide better recovery it isessential to tune the value of τ for each iteration of thealgorithm. In SVT the soft-thresholding operator is appliedon a sparse matrix which increases the difﬁculty of the tuningprocess. B. Exploiting second order statistics

In order to overcome the limitation imposed by the sparsestructure of the matrix Y k , the proposed algorithm estimatesthe missing entries prior to the soft-thresholding step. Thus, theavailable prior knowledge is exploited to produce an estimateof the entries not contained in Ω . In this case, at each iteration k of the proposed algorithm the matrix Z k is computed as Z k = Y k + L k , (28)where Y k is deﬁned as in the SVT algorithm and L k is theLMMSE estimate given by L k = P Ω c ( µ ) + Σ Ω c Ω Σ − ( P Ω ( Y k ) − P Ω ( µ )) , (29)where Ω is the set of observed entries, Ω c is the set of missingentries, Σ Ω c Ω is the covariance matrix between the entries in Ω c and the entries in Ω and Σ ΩΩ is the covariance matrixof the entries in Ω . In a nutshell, the unknown entries areestimated using the LMMSE-based recovery at each iteration k . The result is a complete matrix Z k for which the tuning ofthe threshold is feasible. C. Optimization of thresholding parameter

Using the main result in [28], the performance of the soft-thresholding operator can be estimated when the input matrixaccepts the following model Z = M + W , (30)where the entries of W are ( W ) i,j iid ∼ N (0 , σ Z ) , (31) for i ∈ { , , ..., N } and j ∈ { , , ..., L } . In this setting, theSURE [27] is given bySURE ( D τ )( Z ) = − N Lσ Z + min ( N,L ) (cid:88) i =1 min ( τ , σ i ( Z ))+ 2 σ Z div ( D τ ( Z )) , (32)where σ i ( Z ) are is the i -th singular value of Z for i ∈{ , , . . . , N } . A closed-form expression for the divergenceof this estimator is obtained in [28]. For the case in which Z ∈ R N × L the divergence is given bydiv ( D τ ( Z )) = min ( N,L ) (cid:88) i =1 (cid:20) I ( σ i ( Z ) > τ ) + | N − L | ( σ i ( Z ) − τ ) + σ i ( Z ) (cid:21) + 2 min ( N,L ) (cid:88) i (cid:54) = j,i,j =1 σ i ( Z )( σ i ( Z ) − τ ) + σ i ( Z ) − σ j ( Z ) , (33)when Z has no repeated singular values and is zero otherwise.Therefore, combining (32) and (33) gives a closed-form ex-pression for the performance of the soft-thresholding operatorfor different values of τ and different noise levels σ Z .The proposed algorithm approximates σ Z with the weightedsum of the noise in Ω and in Ω c . Consequently, σ Z k iscalculated as σ Z k = (cid:107) Y k − P Ω ( M ) (cid:107) F + | Ω c | D LMMSE

N L , (34)where D LMMSE represents the average noise per entry in Ω c .The optimal threshold for the matrix Z k is denoted by τ k ∗ andit is calculated using τ k ∗ = arg min τ SURE ( D τ )( Z k ) , (35)where σ Z k is given by (34). Therefore, the iterations of theproposed algorithm are  X k = D τ ( Z k − ) , Y k = Y k − + δ b P Ω ( M − X k ) , Z k = Y k + L k , (36)where the D τ is deﬁned by (23) and the step size δ b is similarto the step size δ s in the SVT algorithm. The initial conditionsare Z = , Y = and τ = 0 . The stopping criteria issimilar to the SVT algorithm, namely (cid:107) P Ω ( X k − M ) (cid:107) F (cid:107) P Ω ( M ) (cid:107) F ≤ (cid:15). (37)The main advantage of the proposed algorithm is that thethreshold is optimized at each iteration facilitated by the priorknowledge incorporated into the structure of the algorithm.First, an initial guess of the unavailable entries is formed,at each iteration k , based on Y k and the covariance matrix Σ . The results are aggregated in the matrix Z k which isapproximated by the model in (30). In this case, an estimateof the noise level, σ Z k , is needed to compute the SURE.The optimal value of τ for Z k is obtained by minimizingSURE ( D τ )( Z k ) in (32). Admittedly, the optimization of thethreshold is only possible as long as second order statistics Algorithm 1

Bayesian Singular Value Thresholding

Input: observations set Ω , and observed entries P Ω ( R ) , mean µ , covariance matrix Σ , step size δ b , tolerance (cid:15) , andmaximum iteration count k max Output: (cid:99) M BSVT Set Y = Set Z = Set τ = 0 Set Ω c = { , , ..., N } × { , , ..., L } \ Ω for k = 1 to k max do Compute [ U , S , V ] = svd ( Z k − ) Set X k = (cid:80) min ( N,L ) j =1 ( max (0 , σ j ( Z k − ) − τ ) u j v j if (cid:107) P Ω ( X k − M ) (cid:107) F / (cid:107) P Ω ( M ) (cid:107) F ≤ (cid:15) then break end if Set Y k = Y k − + δ b P Ω ( M − X k ) Set L k = P Ω c ( µ ) + Σ Ω c Ω Σ − ( Y k − P Ω ( µ )) Set Z k = Y k + L k Set σ Z k = ( (cid:107) Y k − P Ω ( M ) (cid:107) F + | Ω c | D LMMSE ) /N L Set τ = arg min τ SURE ( D τ )( Z k ) end for Set (cid:99) M BSVT = X k are available. Therefore, the new approach requires additionalknowledge that is not necessary when using the SVT algo-rithm. That being said, the SVT algorithm requires setting thevalue for the threshold which in general is difﬁcult to tune. Thesame amount of prior knowledge, i.e., covariance matrix, isrequired by the LMMSE estimator. Still, when the postulatedstatistics are not accurate, the performance of the LMMSE-based recovery reduces by up to an order of magnitude inNMSE (See Fig. 2). For the proposed algorithm, the trade-off between the performance and the accuracy of the priorknowledge is studied in Section VI.VI. N UMERICAL A NALYSIS

This section analyzes the performance of the BSVT algo-rithm using the real data set presented in Section II-A. The datamatrix M , utilized to assess the performance of the proposedalgorithm, is the same used in Section IV-C and contains thevoltage measurements from the electricity distribution system.Similarly, the simulations in this section assume a signal tonoise ratio value of SNR = 20 dB. Moreover, the performanceof the BSVT algorithm is also measured in terms of NMSEgiven by NMSE ( M ; BSVT ) = (cid:107) M − (cid:99) M BSVT (cid:107) F (cid:107) M (cid:107) F , (38)where (cid:99) M BSVT is the output of the BSVT recovery. Theperformance of each recovery technique is averaged over 20realisations of Ω for each ratio of missing entries. Numerically,the proposed algorithm is evaluated on three aspects. First, thegain in performance for the optimized threshold is assessed.The Section VI-A compares the performance of the SVT-basedrecovery with the BSVT algorithm when accurate secondorder statistics are available. Secondly, the robustness of theBSVT recovery when perfect prior knowledge is not available -3 -2 -1 Figure 3. Real data recovery performance using SVT, LMMSE estimationand BSVT for different levels of mismatch, when SNR = 20 dB. is evaluated. A comparison between the SVT algorithm, theLMMSE estimator and the BSVT recovery is presented fordifferent SMR values. The case in which perfect second-orderstatistics are available is also included. Finally, the robust-ness of the BSVT recovery to different sampling patterns isevaluated using Markov-chain-based sampling. The numericalperformance of the new algorithm is compared to the SVTalgorithm for the case in which the positions of the missingentries are not uniformly distributed.

A. Performance of the optimized threshold

In this section, the performance of the new algorithm iscompared to the SVT-based recovery using same data matrix M and the same sets of available entries, Ω , for a particularratio of missing entries γ as deﬁned in (26). The positions ofthe missing entries are sampled uniformly at random from theset of all entries.Fig. 3 depicts the performance of both algorithms whenapplied in identical scenarios. Clearly, the optimized thresholdand the Bayesian estimation step increase the performanceof the algorithm when accurate second order statistics areavailable. When the postulated statistics, i.e., those availableto the operator are identical to the real statistics, the BSVTalgorithm provides a better performance for all values of γ . The gain in performance is larger when the ratio ofmissing entries is smaller than . . Interestingly, the boost inperformance is substantial in the region in which SVT is leastefﬁcient when compared to the fundamental limit (See Fig.2). However, in practical scenarios the postulated and actualstatistics are different. The impact of mismatched statistics isconsidered in the following section. B. Robustness with respect to mismatched statistics

In order to address the problem of missing data recoveryin a realistic scenario, a level of mismatch between the realcovariance matrix and the one available to the operator isconsidered. The mismatch covariance matrix model presentedin (27) is used in this section to assess the sensibility of the

Figure 4. Positions of the observed entries, Ω , generated by the Markovianmodel for a × matrix, when E [ L ] = N and E [ γ ] = 0 . . proposed algorithm to inaccurate prior knowledge. Hence, theLMMSE estimator and the BSVT algorithm are compared inthe no-mismatch regime and for a SMR value of 100 and10. The performance of the SVT-based recovery is includedas a benchmark for comparing rank minimization based ap-proaches.Fig. 3 depicts the performance of the different estima-tion methods when mismatched second order statistics areavailable. Remarkably, the proposed algorithm is robust tomismatch in the second order statistics. In contrast with theLMMSE estimator, the performance of the BSVT algorithmdoes not change signiﬁcantly when mismatch occurs. More-over, the BSVT algorithm gives better recovery than the SVT-based recovery in all mismatch regimes throughout the rangeof γ . In comparison with the LMMSE estimation, the BSVTalgorithm performs better for SMR = 100 when γ ≤ . .Furthermore, for SMR = 10 the proposed approach is thebest performing recovery method for almost all values of γ .In practical scenarios, when the mismatch regime is difﬁcult toestablish, the choice between LMMSE and SVT is difﬁcult tomake. BSVT is a robust alternative and gives better recoveryin a wide range of missing date regimes. C. Robustness with respect to different sampling patterns

The problem of recovering missing data when the subset ofmissing entries is not uniformly sampled is addressed in thissection. In practical scenarios, a sensor failure or a downtimein the communication line provides the operator with a numberof consecutive unavailable measurements in the state variablevectors. Let L be the number of consecutive missing entries.The expected value of L varies depending on the reliability ofthe sensing infrastructure. In the uniform sampling model thisscenario is not possible. In contrast, a more general samplingprocedure is introduced.The proposed sampling model is based on a two-stateMarkov Chain. In this setting, for each entry ( M ) i,j of thematrix M , the ﬁnite state machine depicted in Fig. 6, iseither in state S in which case the entry ( i, j ) is available -3 -2 -1 Figure 5. Real data recovery performance for the Markov-chain-basedsampling model, using SVT and BSVT for different levels of mismatch, when E [ L ] = N and SNR = 20 dB.Figure 6. State diagram for the Markovian sampling model. to the operator, or in state S in which case the entry is notavailable. As before, the set Ω contains all the entries fromthe matrix M that are available to the operator. In Fig. 6, p is the transition probability from state S to S and p isthe transition probability from S to S . Hence, the expectedvalue of the ratio of missing entries is given by the steady stateprobability of being in S . Consequently, the expected valueof the ratio of missing entries for the Markovian samplingmodel is E [ γ ] = p p + p . (39)The expected number of consecutive missing entries, E [ L ] ,is: E [ L ] = n (cid:88) l =0 l p p + p (1 − p ) l . (40)Solving (40) for n → ∞ and combining with (39) leads to E [ L ] = 1 − E [ γ ] p (1 − p ) . (41)Therefore, for any given γ and L , using (39) and (41), p and p are identiﬁed such that on average the sampling modelin Fig. 6 has a ratio of missing entries γ and the length ofthe vectors with consecutive missing entries L . Note thatthe case E [ L ] = 1 reduces to the uniform sampling modelwith probability P [( i, j ) ∈ Ω] = 1 − γ . In this framework, acomparison between the SVT and the BSVT-based recoveriesis presented for the case in which the sampling pattern is notuniform. In order to consider the case in which a particularfeeder does not provide any measurements, the expected lengthof the vectors with missing data is selected to be equal to the length of the state variable vectors, i.e., E [ L ] = N . Fig.4 shows an example of a sampling pattern generated by theMarkov-chain-based model, when E [ L ] = N and E [ γ ] = 0 . .Fig. 5 compares the performance of the SVT-based recoverywith the BSVT-based recovery for the case in which the matrix M is sampled using the Markov-chain-based sampling modelwith E [ L ] = N . Different levels of mismatch are introducedto assess the robustness of the new algorithm to mismatchedprior knowledge when the sampling pattern is not uniform.Remarkably, the performance of the proposed approach is notsigniﬁcantly affected by the amount of prior knowledge inany of the missing data regimes. Moreover, BSVT performsbetter than SVT when the sampling pattern is not uniform. Asigniﬁcant gain in performance is observed for small values of γ . Consider the following example for the sake of discussion,for a ﬁxed tolerance of − in NMSE, the SVT algorithmrecovers up to of the entries of the matrix M whileBSVT recovers (See Fig. 5). The improvement in thedata recovering performance for the same level of tolerance issigniﬁcant. Numerical results in this section show that BSVTis not only providing better performance than SVT when theentries are not uniformly sampled but it is also robust tomismatched statistics. The robustness of the new algorithmextends to different sampling patterns. In view of this, BSVTrepresents a better alternative for recovering missing data inpractical scenarios than SVT and LMMSE estimation.VII. C ONCLUSION

A novel algorithm for recovering missing data in data setsthat admit a low rank description has been presented. Theproposed approach, BSVT, combines the low computationalcost of SVT with the optimality of the LMMSE estimatorwhen the data source is modelled as a multivariate Gaussianrandom process and second order statistics are available. Thecombined new approach addresses the issues of individual re-covery methods. The robustness of the new algorithm on bothmismatched statistics and sampling patterns was demonstratedthrough simulations. In respect to the SVT algorithm the newapproach addresses the issue of choosing the value of τ bycalculating the optimal threshold at each iteration. Comparedwith the standard LMMSE estimator the new algorithm isrobust to inaccurate second order statistics. In order to assesspractical scenarios, a sampling model that incorporated miss-ing state variable vectors, is illustrated. The performance gaincompared to SVT was signiﬁcant for both uniform and non-uniform sampling models. Ultimately, the proposed algorithmis shown to provide a robust and low complexity method torecover missing data in low voltage distribution systems.R EFERENCES[1] A. Navarro-Espinosa and L. F Ochoa, “Probabilistic impact assessmentof low carbon technologies in LV distribution systems,”

IEEE Trans.Power Syst. , vol. 31, no. 3, pp. 2192–2203, May 2016.[2] R.A. Walling, R. Saint, R. C. Dugan, J. Burke, and L.A. Kojovic,“Summary of distributed resources impact on power delivery systems,”

IEEE Trans. Power Del. , vol. 23, no. 3, pp. 1636–1644, Jul. 2008.[3] C. Long and L. F. Ochoa, “Voltage control of PV-rich LV networks:OLTC-ﬁtted transformer and capacitor banks,”

IEEE Trans. Power Syst. ,vol. 31, no. 5, pp. 4016–4025, Sep. 2016. [4] M. Ozay, I. Esnaola, F. T. Y. Vural, S. R. Kulkarni, and H. V. Poor,“Sparse attack construction and state estimation in the smart grid:Centralized and distributed models,”

IEEE Journal on Selected Areasin Communications , vol. 31, no. 7, pp. 1306–1318, Jul. 2013.[5] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, “Malicious data attackson smart grid state estimation: Attack strategies and countermeasures,”in

Proc. of the First IEEE International Conference on Smart GridCommunications , Oct. 2010, pp. 220–225.[6] P. Gao, M. Wang, S. G. Ghiocel, J. H. Chow, B. Fardanesh, and G. Ste-fopoulos, “Missing data recovery by exploiting low-dimensionality inpower system synchrophasor measurements,”

IEEE Trans. Power Syst. ,vol. 31, no. 2, pp. 1006–1013, Mar. 2016.[7] C. Genes, I. Esnaola, S. M. Perlaza, L. F. Ochoa, and D. Coca, “Re-covering missing data via matrix completion in electricity distributionsystems,” in

Proc. of the IEEE International Workshop on SignalProcessing Advances in Wireless Communications , Jul. 2016, pp. 1–6.[8] Y. Isozaki, S. Yoshizawa, Y. Fujimoto, H.i Ishii, I. Ono, T. Onoda,and Y. Hayashi, “Detection of cyber-attacks against voltage controlin distribution power grids with PVs,”

IEEE Trans. Smart Grid , vol. 7,no. 4, pp. 1824–1835, 2016.[9] I. Esnaola, A. Tulino, and J. Garcia-Frias, “Linear analog coding ofcorrelated multivariate Gaussian sources,”

IEEE Trans. Commun. , vol.61, no. 8, pp. 3438–3447, Aug. 2013.[10] E. J. Cand`es and B. Recht, “Exact matrix completion via convexoptimization,”

Communications of the ACM , vol. 55, no. 6, pp. 111–119,2012.[11] E. J. Cand`es and T. Tao, “The power of convex relaxation: Near-optimalmatrix completion,”

IEEE Trans. Inf. Theory , vol. 56, no. 5, pp. 2053–2080, May 2010.[12] E. J. Cand`es and Y. Plan, “Matrix completion with noise,”

Proceedingsof the IEEE , vol. 98, no. 6, pp. 925–936, Jun. 2010.[13] R. Meka, P. Jain, and Inderjit S. D., “Matrix completion from power-lawdistributed samples,” in

Proc. of the Advances in Neural InformationProcessing Systems 22 , Dec. 2009, pp. 1258–1266.[14] V. Kekatos, Y. Zhang, and G. B. Giannakis, “Electricity marketforecasting via low-rank multi-kernel learning,”

IEEE J. Sel. TopicsSignal Process. , vol. 8, no. 6, pp. 1182–1193, Dec. 2014.[15] P. Gao, M. Wang, S. Ghiocel, J. Chow, B. Fardanesh, and G. Stefopoulos,“Missing data recovery by exploiting low-dimensionality in powersystem synchrophasor measurements,”

IEEE Trans. Power Syst. , vol.31, no. 2, pp. 1006–1013, 2016.[16] J.-F. Cai, E. J. Cand`es, and Z. Shen, “A singular value thresholdingalgorithm for matrix completion,”

SIAM Journal on Optimization , vol.20, no. 4, pp. 1956–1982, Mar. 2010.[17] I. Esnaola, A. M. Tulino, and H. V. Poor, “Mismatched MMSEestimation of multivariate Gaussian sources,” in

Proc. of the IEEEInternational Symposium on Information Theory , Jul. 2012, pp. 716–720.[18] S. Verd´u, “Mismatched estimation and relative entropy,”

IEEE Trans.Inf. Theory

Elements of information theory , John Wiley& Sons, 2012.[21] A. Kolmogorov, “On the Shannon theory of information transmissionin the case of continuous signals,”

IRE Transactions on InformationTheory , vol. 2, no. 4, pp. 102–108, Dec. 1956.[22] J.F. Sturm, “Using SeDuMi 1.02, a MATLAB toolbox for optimizationover symmetric cones,”

Optimization Methods & Software , vol. 11, no.1-4, pp. 625–653, 1999.[23] K.C. Toh, M.J. Todd, and R.H. Tutuncu, “SDPT3 - A MATLAB softwarepackage for semideﬁnite programming, version 1.3,”

OptimizationMethods & Software , vol. 11, no. 1-4, pp. 545–581, 1999.[24] Z. Liu and L. Vandenberghe, “Interior-point method for nuclear normapproximation with application to system identiﬁcation,”

SIAM Journalon Matrix Analysis and Applications , vol. 31, no. 3, pp. 1235–1256,Nov. 2009.[25] D. Donoho and M. Gavish, “Minimax risk of matrix denoising bysingular value thresholding,”

Ann. Statist. , vol. 42, no. 6, pp. 2413–2440, Dec. 2014.[26] X. Jia, X. Feng, and W. Wang, “Rank constrained nuclear normminimization with application to image denoising,”

Signal Processing ,vol. 129, pp. 1–11, Dec. 2016.[27] C. M. Stein, “Estimation of the mean of a multivariate normaldistribution,”

The Annals of Statistics , pp. 1135–1151, Nov. 1981.[28] E. J. Cand`es, C. A. Sing-Long, and J. D. Trzasko, “Unbiased riskestimates for singular value thresholding and spectral estimators,”