[PDF] A Hybrid Framework for Topology Identification of Distribution Grid with Renewables Integration

Abstract

Topology identification (TI) is a key task for state estimation (SE) in distribution grids, especially the one with high-penetration renewables. The uncertainties, initiated by the time-series behavior of renewables, will almost certainly lead to bad TI results without a proper treatment. These uncertainties are analytically intractable under conventional framework--they are usually jointly spatial-temporal dependent, and hence cannot be simply treated as white noise. For this purpose, a hybrid framework is suggested in this paper to handle these uncertainties in a systematic and theoretical way; in particular, big data analytics are studied to harness the jointly spatial-temporal statistical properties of those uncertainties. With some prior knowledge, a model bank is built first to store the countable typical models of network configurations; therefore, the difference between the SE outputs of each bank model and our observation is capable of being defined as a matrix variate--the so-called random matrix. In order to gain insight into the random matrix, a well-designed metric space is needed. Auto-regression (AR) model, factor analysis (FA), and random matrix theory (RMT) are tied together for the metric space design, followed by jointly temporal-spatial analysis of those matrices which is conducted in a high-dimensional (vector) space. Under the proposed framework, some big data analytics and theoretical results are obtained to improve the TI performance. Our framework is validated using IEEE standard distribution network with some field data in practice.

Full PDF

11 A Hybrid Framework for Topology Identiﬁcation ofDistribution Grid with Renewables Integration

Xing He,

Member, IEEE , Robert C. Qiu,

Fellow, IEEE , Qian Ai,

Senior Member, IEEE , Tianyi Zhu

Abstract —Topology identiﬁcation (TI) is a key task for stateestimation (SE) in distribution grids, especially the one with high-penetration renewables. The uncertainties, initiated by the time-series behavior of renewables, will almost certainly lead to badTI results without a proper treatment . These uncertainties are analytically intractable under conventional framework—they areusually jointly spatial-temporal dependent , and hence cannot besimply treated as white noise. For this purpose, a hybrid frame-work is suggested in this paper to handle these uncertainties in a systematic and theoretical way; in particular, big data analytics arestudied to harness the jointly spatial-temporal statistical properties of those uncertainties. With some prior knowledge, a model bankis built ﬁrst to store the countable typical models of networkconﬁgurations; therefore, the difference between the SE outputsof each bank model and our observation is capable of beingdeﬁned as a matrix variate —the so-called random matrix . In orderto gain insight into the random matrix, a well-designed metricspace is needed. Auto-regression (AR) model, factor analysis (FA),and random matrix theory (RMT) are tied together for the metricspace design, followed by jointly temporal-spatial analysis of thosematrices which is conducted in a high-dimensional (vector) space .Under the proposed framework, some big data analytics and theoretical results are obtained to improve the TI performance.Our framework is validated using IEEE standard distributionnetwork with some ﬁeld data in practice.

Index Terms —topology identiﬁcation, renewables, uncertainty,random matrix theory, AR model, factor analysis, high dimension

I. I

NTRODUCTION T Opology identiﬁcation (TI) of admittance matrix Y , theso-called network topology, is a precondition for stateestimation (SE) in distribution systems. Inaccurate TI has longbeen cited as a major cause of bad SE results [1]. During adaily operation, Y may be partially reconﬁgured [2]. While theknowledge of Y is crucial, it may be unavailable or outdated(via TI) due to some reasons [3–7]. Among these reasons,the uncertainties caused by the behavior of high-penetrationrenewables [8, 9], which are analytically intractable for mosttools, are one of the main challenges. How to address theseuncertainties by harnessing their jointly spatial-temporalstatistical properties is at the heart of our study, and thisquestion threads throughout the proposed hybrid framework.

A. Related Work and Motivation of our Work

Ref. [10–12] are relevant to our paper to an extent. Ref. [10]builds a model bank, and then conducts TI task by applying arecursive Bayesian approach to identify the correct network

This work was partly supported by National Key Research & Development(R&D) plan of China (grant No. 2016YFB0901300), and National NaturalScience Foundation of China (grant No. 51907121 and No. U1866206). conﬁguration in the bank. Ref. [11] conducts TI task bycomparing the collected voltage time series with a libraryof signatures computed a priori. Ref. [12] formulates the TIproblem as a mixed integer quadratic programming (MIQP)model to ﬁnd a topology conﬁguration with weighted leastsquare (WLS) of measurement residues.Several data-driven TI approaches, as Ref. [3–7], are pro-posed recently. They are mainly based on iterations, graph the-ory, sparsity-based regularization, and so on. These approachesare feasible to TI task with very little knowledge about thenetwork. Ref. [6] tells that an accurate TI result is acquirable only if the noise is well addressed . For instance, even witha small error in measurements, the regression-based methodmay fail in TI task (see Sec. V-C in

Case Studies ). Most TIalgorithms, especially those derived from least square, relyheavily on the second-order statistics of meter data [4, 5], andhence they are applicable to (Gaussian) white noise.Renewables-derived uncertainties (e.g., randomness causedby a gust of wind), however, often exhibit themselves asnon-Gaussian noise . The conventional statistics such asﬁrst/second-order statistics (mean/variance) are even not nearlyenough to represent these non-Gaussian variables, and the (jointly spatial-temporal) dependence should be taken intoaccount . Therefore, there is an urgent need for some powerfulapproach to make these uncertainties analytically tractablewith a systematic and theoretical procedure . This is the major motivation and superiority of our proposed hybridframework. Under our framework, some statistical propertiesand theoretical results are established.

B. Our Work and its Contributions

In order to handle the renewables-derived uncertainties, we have to go back to the model bank following Ref. [10, 11]. Itis reasonable and feasible to list all the possible models inpractice with prior knowledge, since the network conﬁgurationof a particular grid must be conﬁned to only a few typicalmodels. Because of the bank, the difference between the bankmodel SE output and our observation is capable of beingdeﬁned as a matrix variate—the so-called random matrix .Then we move to the heart of our hybrid framework—high-dimensional analytics of the random matrix. Auto-regression(AR) model, factor analysis (FA), and random matrix theory(RMT) are tied together for the jointly temporal-spatialmodeling and analysis of the random matrices. And high-dimensional statistics are obtained as big data analytics. Thisframework enables us to gain insight into the (multiple)renewables-derived uncertainties, which are analytically in-tractable under conventional framework. a r X i v : . [ s t a t . A P ] S e p In particular, our framework deals with a large number(spatial space, N ) of nodes simultaneously, and each node( i = 1 , ..., N ) samples time-series within a given duration(temporal space, T ) of observation. Classical statistic theoriestreat ﬁxed N only (often small, typically N < [13]) .This ﬁxed (small) N is called the low-dimensional regime.In practice, we are interested in the case that N can varyarbitrarily in size compared with T (often T is large, typically N > , c = N/T > [13]). This fundamental requirementis the primary driving force for us to study big data analyticswith high-dimensional statistics. For jointly spatial-temporalanalysis, a (large-dimensional) data matrix , rather than avector or a scalar [14], is adopted as the basis.This work is expected to contribute some insight to the(multiple) renewables-derived uncertainties that are often an-alytically intractable. We take advantage of high-dimensionalstatistics that is made analytically tractable only recently [15,16]. To our knowledge, this type of analysis is, for theﬁrst time , conducted in the context of TI. Our big dataanalytics are motivated to improve TI performance, and maybe further expanded to other applying ﬁelds: the detectionand localization of faults [17], the detection of unmonitoredswitching of circuit breakers in network reconﬁguration [18],etc.The remainder of this paper is organized as follows.- Sec. II presents the hybrid framework and gives a generaldiscussion about it.- Sec. III, by employing the model bank, aims to convert ourobservation into a random matrix with prior knowledge.- Sec. IV studies the high-dimensional statistics of the randommatrices based on AR, FA, and RMT.- Sec. V validates our framework with case studies based onIEEE standard distribution network using some ﬁeld data.II. H YBRID F RAMEWORK OF T OPOLOGY I DENTIFICATION

A. Hybrid Framework

Fig. 1 summarizes the presented framework by illustratinghow Model Bank, AR, FA, RMT are put together coherently.The hybrid framework mainly consists of two parts—themodel-based part (Sec. III) and the data-driven part (Sec. IV).The former, with prior knowledge, converts the observed datainto “difference” in the form of random matrix. Starting fromthe random matrix and going through a rigorous mathematicprocedure , the latter aims to gain insight into the uncertaintiesthrough big data analytics, with a focus on the jointly spatial-temporal analysis and the underlying theories/tools.First, we build “bank” (referring to [10]) to store countable (often a few) virtual models mapping the possible networkconﬁgurations of a real grid. The bank can be seen as the universal set of possible models among which we try topick out the most likely one. Hence, we need a well-designedmetric space— a set together with a metric deﬁned on it .The SE for the models, mainly based on power ﬂow (PF)analysis, is the second step. We make an assumption that eachagent on distributed nodes (Agent i on Node i for instance)does collect some local information, such as power usage ( P i )and voltage magnitude ( V i ), on its own access point (Node i ). However, it has no prior information about how it isconnected via power lines in the network, not to mentionpower ﬂow on the branch ( P i,j and Q i,j ). The information on P i,j and Q i,j is often a precondition for some SE algorithms[12], but not for ours. From this aspect, our assumption is practical and ﬂexible for engineering scenarios.Then we move forwards to the difference X , which ismodeled as a non-Gaussian random matrix for further big dataanalytics. For each bank model (Model M m for instance), itsSE output ( ˆ Z m ) does provide a comparison for our observation( Z ob ), and then the difference X m is deﬁned as X m = Z ob − ˆ Z m . (1)Each X m consists of multiple time-series , which canbe generally decomposed into four components—the trend,the seasonality, the mutation, and the randomness. Featureextraction of the trend and the seasonality is a well discussedtopic in time-series analysis [19], and our previous work [20]has proposed an RMT-based mutation detection algorithm tohandle sudden changes. Here we focus on the randomness. B. Non-Gaussian Randomness Tools and Related Work

The randomness component of renewables-derived uncer-tainties cannot be simply modeled as white noise— successiveobserved data in the form of time-series usually showserial dependence . In order to formally incorporate this( temporal ) dependent structure, it is reasonable to explore ageneral class of models called auto-regressive (AR) models— x t = (cid:80) pi =1 b i x t − + (cid:15) t [21]. From the spatial aspect, FAand RMT are tied together to conduct jointly temporal-spatialanalysis of the dependence among those multiple time-series.1) Factor Analysis: FA is often used for dimension reductionin high-dimensional datasets [15]. Because of the latentconstructs (e.g., spatial-temporal independence) lying in thesampling data, FA is preferred to principal component anal-ysis (PCA) [22]. FA has already been successfully appliedin various ﬁelds such as statistics [23] and econometrics[24]. Ref. [25] employs FA to handle high-frequency datain ﬁnancial market. In power system domain, our previouswork [26] applies FA to anomaly detection and locationwith both simulated data and ﬁeld data.2) Random Matrix Theory: The entries of a random matrixare random variables and the matrix size is often verylarge, so RMT is naturally connected with our problem athand. The goal of RMT is to understand the joint eigen-value distribution in the asymptotic regime as the statisticanalytics from big data. To our best knowledge, RMT isdeveloped to address this high-dimensional regime sinceclassical statistic theories apply to low-dimensional regimeonly [13]. Recently, RMT has already been successfullyapplied in many ﬁelds of power system [20].3) ARMA+RMT: This mode is relevant to our big data ana-lytics. Ref. [27] employs the free random variables (FRV)calculus to calculate the empirical spectral density (ESD) ofthe sample covariance for several VARMA-type processes.The derivation is RMT-based and mathematically rigorous;the theoretical result is nicely matched against the spectraobtained via Monte Carlo simulations. Z Model Model Model m ... SE SE SE m ... Z ob Model Bank

State Estimation ^ Z ^ Z m ^ Jointly

Spatial- temporal

Analysis X X X m III. Utilizing prior knowledge, model-based IV. Big data analytics, data-driven (AR, RMT, FA) bp AR Models of Renewables

Time Series p = k , k = R m,k FA Residues ρ T RMTMulti r.v. ρ E arg min D ( ρ T , ρ E ) m Theoretical SpectrumEmpirical Spectrum ... ...

Grid with Renewables

Sensors Sampling Spatial- temporal

UncertaintiesMultiple RMT ^ Z ob : Observ ed data; Z m : Output of SE m (State e stimation r esult of the m -th model); X m : Difference as a random matrix; b : Given parameter for AR m odel; p : Given parameter for FA; R: Residues of FA; ρ : Spectral densities X m : Random Matrix with Spatial- temporal St atistical Information and Prop er ties Eq. (7)

Eq. (8)→(12)→(11)→(10) Eq. (1)

Eq. (15)Eq. (13): M-P Law≡ ρ T (0) Eq. (9)Eq. (4)Eq. (6)

Eq. (14)

Fig. 1: Proposed Hybrid FrameworkIII. M

ODEL - BASED P ART U TILIZING P RIOR K NOWLEDGE

This part aims to convert our observed data into “difference” in the form of random matrices . With PF analysis, the SEoutput of each bank model is computed as Z m . It suppliesa comparison for our observed data Z ob , and hence thedifference is capable of being deﬁned (Eq. 1). A. Grid Network Operation

For each node in a power grid, Node i for instance, consider-ing the node-to-ground admittance y i ( y i = g i + j · b i , j = √− ),its active power P and reactive power Q are expressed as:  P i = V i (cid:88) k (cid:54) = i V k ( G ik cos θ ik + B ik sin θ ik ) − V i (cid:88) k (cid:54) = i G ik − V i g i Q i = V i (cid:88) k (cid:54) = i V k ( G ik sin θ ik − B ik cos θ ik )+ V i (cid:88) k (cid:54) = i B ik + V i b i (2) Abstractly, a physical power system obeying Eq. (2) can beviewed as an analog engine—it takes bus voltage magnitude V and phase angel θ as inputs , conductance G and susceptance B as given parameters , and “computes” active power injec-tion P and reactive power injection Q as outputs . Thus, theentries of Jacobian matrix J , i.e. [ J ] ij , are deﬁned as the partialderivatives of the outputs, P and Q , with respect to the inputs, V and θ . All in all, J consists of four parts H , N , K , L :  H ij = V i V j ( G ij sin θ ij − B ij cos θ ij ) − δ ij · Q i + δ ij · V i b i N ij = V i V j ( G ij cos θ ij + B ij sin θ ij )+ δ ij · P i − δ ij · V i g i K ij = − V i V j ( G ij cos θ ij + B ij sin θ ij )+ δ ij · P i + δ ij · V i g i L ij = V i V j ( G ij sin θ ij − B ij cos θ ij )+ δ ij · Q i + δ ij · V i b i (3) where H ij = ∂P i ∂θ j , N ij = ∂P i ∂V j V j , K ij = ∂Q i ∂θ j , L ij = ∂Q i ∂V j V j . B. Power Flow Analysis

PF analysis deals mainly with the calculation of steady-state system status, i.e., voltage magnitude V and phase angel θ , on each network bus, for a given set of variables such asload demands, under certain assumptions such as in a balancedsystem operation [28]. Conventional PF analysis is model- and assumption-based. That is to say, the information of networktopology Y is a prerequisite for the calculation, and the input(output) variables need to be preset as one of the followingthree categories: • P and V ( Q and θ ) for voltage controlled bus/ P V bus; • P and Q ( V and θ ) for load bus/ P Q bus; • V and θ ( P and Q ) for reference bus/slack bus.Consider a power system with n buses, among which thereare m P V buses, l P Q buses, and slack bus ( n = l + m +1 ).Starting with Eq. (2), PF functions is formulated as Eq. (4). y :=  P ... P n − Q m +1 ... Q n −  = f  θ ... θ n − V m +1 ... V n −  =: f ( x ) J =  ∂y ∂x · · · ∂y ∂x K ... . . . ... ∂y K ∂x · · · ∂y K ∂x K  (4)where := is the assignment symbol in computer science.Eq. (4) builds a differentiable mapping function f : x ∈ R K → y ∈ R K . It consists of K = 2 n − m − equations, fromthe same number ( m +2 l = K ) state variables, θ and V , to thepower injections, P and Q . Following Eq. (3), J is calculatedas a K × K matrix: J = (cid:20) [ H ] n − ,n − [ N ] n − ,n − m − [ K ] n − m − ,n − [ L ] n − m − ,n − m − (cid:21) (5)To formulate the linear approximation process that the sys-tem operation point shifts from ( x ( k ) , y ( k ) ) to ( x ( k +1) , y ( k +1) ) ,the iteration is set as follows: x ( k +1) := x ( k ) + J − (cid:16) x ( k ) (cid:17) (cid:16) y ( k +1) − y ( k ) (cid:17) (6)The iteration depicts how to update the state variables from x ( k ) to x ( k +1) . y ( k ) and x ( k ) are known quantities under ourassumption in Sec. II-A. y ( k +1) , according to Eq. (4), is thedesired P, Q on P Q buses and desired P on P V buses . For

P Q buses, neither V nor θ are ﬁxed; they are state variables that needto be estimated. For P V buses, V is ﬁxed, and θ needs to be estimated. x ( k +1) is the state variables that need to be estimated throughthe iteration in this expression (Eq. 6).The above model-based deterministic PF analysis is notalways reliable in practice , since the network topology Y ,and the operation points ( x ( k ) , y ( k ) ) are required to be of highprecision and up-to-date . These requirements, unfortunately,are often unrealistic as mentioned in Sec I. C. Model Bank

During the daily operation of a distribution grid, its topol-ogy may be partially reconﬁgured due to maintenance oremergency/optimal operation. Taking IEEE 33-bus networkfor instance, the network topology is shown in Fig. 2. It is a12.66-kV distribution grid system including a substation and37 branches. The normally closed branches are represented bysolid lines, and normally opened ones by dashed lines.

11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414 1515 1616 1717 18181919 2020 2121 2222

Fig. 2: IEEE 33-bus NetworkWith the pair switch of these normally closed/openedbranches, the grid has ‘countable’ possible network conﬁgu-rations. In practice, however, it is reasonable to study onlya few models even for a large system, since the networkconﬁguration of a particular grid must be conﬁned to severaltypical models. We employ the concept of ‘bank’ referring to[10] to store them, and the deterministic PF analysis worksout the SE results of these models as ˆ Z m . These SE outputs ˆ Z m allow of the comparison with our observed data Z ob , andhence the difference X is capable of being deﬁned as a randommatrix (Eq. 1). In this way, the TI task is converted into amatching problem under certain metric space . The designof the metric space will be discussed in Sec. IV-E.IV. H IGH -D IMENSIONAL A NALYSIS WITH

RMT

AND

FAOur motivation arises from the fact that the renewables-derived uncertainties cannot be simply modeled as whitenoise . It does contain much (latent) structural information,especially when there is an extra bias caused by a certain(although maybe unknown) poor assumption or negligence.For this purpose, FA is employed in our framework. Theentries of the resultant matrix are random variables and largein size , so RMT is naturally relevant to the problem [20].

A. RMT-based Problem Formulation

The goal of RMT is to understand joint eigenvalue dis-tribution in the asymptotic regime as big data analytics.The spectrum of a covariance matrix generally consists oftwo parts: A few spikes/outliers and the bulk. The formerrepresents common factors that mainly drive the features, andthe latter represents unique factors or error variation thatarise from idiosyncratic noise . For the noise part, we consider a minimum distance between two spectral densities—a theo-retical one ρ T from an ideal structure model, and an empirical one ρ E relevant to the (multiple time-series) observed data. B. Factor Analysis Formula

In dealing with high-dimensional datasets, FA is often usedfor dimension reduction in sampling data with underlyingconstructs that cannot be measured directly [15, 22].Regarding empirical data X ∈ R N × T , FA is formulated as X = L ( p ) F ( p ) + R . (7)where F ∈ R p × T is a matrix of common factors, L ∈ R N × p is factor loadings, p is factor numbers, and R ∈ R N × T isresidues, also called unique factors or error variation.Eq. (7) enables to decompose observed data X into system-atic information and idiosyncratic noise . Usually, only X is observable , L is composed of the ﬁrst p principal componentsof X , F = (cid:0) L T L (cid:1) − L T X , and R = X − LF . We focus on residues R , which may contain some latentconstructs and statistical information. Instead of regarding R as Gaussian noise a priori, we assume that there are cross-and auto-correlated structures . Without loss of generality, ˆ R is represented as ˆ R = A / N (cid:15) B / T (8)where (cid:15) is an N × T Gaussian matrix with independent andidentically distributed (i.i.d.) random entries, A N and B T are N × N and T × T symmetric non-negative deﬁnite matrices,representing cross- and auto- covariances, respectively. Eq. (8)leads to a separable sample covariance matrix in the sensethat A N and B T are separable. This structural assumption ofseparability is a popular assumption in the analysis of spatial-temporal data [16]. Although this assumption does not allowfor spatial-temporal interactions in the covariance matrix, inmany real data applications, the covariance matrix can bewell approximated using separable covariance matrices fora space-time covariance matrix problem. C. FA Estimation Based on Spectrum Analysis

Now the objective of the mentioned matching problem is tomatch the spectral density ρ E against ρ T . The former ρ E means the ESD of the covariance matrixof residues R constructed from empirical data . It can becontrolled by the p number of common factors to be removedfollowing Eq. (7). It is deﬁned as [29] ρ E ( λ ) = 1 N N (cid:88) i =1 δ (cid:16) λ − λ ( C N ) i (cid:17) (9)where { λ ( C N ) i } Ni =1 is the eigenvalues of C N = T RR T , and δ is the Dirac delta function.The latter ρ T means the theoretical spectral density ofthe ideal covariance matrix ˆ C N with the assumed structuralmodel, i.e., ˆ C N = T ˆ R ˆ R T = T A / N (cid:15) B T (cid:15) T A / N (Eq. 8). As-suming a parsimonious matrix structure of A N and B T , whichis determined by only a small parameter set θ . Mathematicallymotivated by the result of [30], the spectral density of ˆ C N ,under certain assumptions, converges to a certain limitingdistribution ρ T ( θ ) , as the size N tends to inﬁnity. D. Simpliﬁed Model on Covariance Structures of Residues

A difﬁculty lies in the calculation of the limiting density, ρ T ( θ ) , for general θ = ( θ A N , θ B T ) . The actual calculationof ρ T ( θ ) is quite complex, which makes the implementationdifﬁcult. A recent study of [27], fortunately, provides the directderivation of this limiting spectral density using free randomvariable (FRV) techniques. They particularly present analyticforms when the time-series follow ARMA processes. In ourtask, we employ these techniques to calculate ρ T ( θ ) . First,two assumptions are made:I. The cross-correlations of ˆ R are effectively eliminatedby removing p factors, and therefore ˆ R has sufﬁcientlynegligible cross-correlation: A N ≈ I N × N . II. The auto-correlations of ˆ R are exponentially decreasing,i.e., { B T } ij = b | i − j | , with | b | < . Under the two assumptions, we can conduct spectrumanalysis of the simpliﬁed model, and thus ρ T ( b ) is capable ofbeing computed. The major steps are brieﬂy given as follows:1. The mean spectral density can be derived from the Green’sfunction G ( z ) by using the Sokhotsky’s formula: ρ T ( λ ) = − π lim ε → + Im G ( λ + iε ) . (10)2. The Green’s function G ( z ) can be obtained from themoments’ generating function M ( z ) : G ( z ) = M ( z ) + 1 z , | z | (cid:54) = 0 . (11)3. M ( z ) can be found by solving the polynomial equation: a c M + 2 a c ( − (1 + b ) z + a c ) M + ((1 − b ) z − a c (1 + b ) z + ( c − a ) M − a M − a = 0 , (12)where a = √ − b , and c = NT . It is worth mentioning that when b = 0 , Assumptions I & IIimply that ˆ R is a standard Gaussian matrix with i.i.d. randomelements, and its spectral density is marked as ρ T (0) . On theother side, Marchenko-Pastur Law says that for a Laguerreunitary ensemble (LUE) matrix Γ ∈ C N × T ( c = N/T ≤ , itsspectral density g MP ( x ) does follow M-P Law [31]: g MP ( x ) = 12 πcx (cid:112) ( x − s ) ( s − x ) , x ∈ [ s , s ] (13)where s = (1 − √ c ) and s = (1 + √ c ) .The two spectral densities should be equivalent, i.e. ρ T (0) is equivalent to g MP . Fig. 3 displays this phenomenon.Fig. 3b also tells that the theoretical spectral densities ρ T ( b ) are distinguishable with different coefﬁcients b in the ARmodel. This property implies that the (latent) coefﬁcients b offers good potential for the metric space construction .With the help of metric space, the randomness component ofthe observed data is able to be addressed from the view ofspectrum analysis. This is equivalent to modeling residues as an AR (1) process: ˆ R it = b ˆ R i,t − + ξ it , where ξ ∼N (cid:0) , − b (cid:1) so that the variance of ˆ R t is 1. Spectral Density when c = 0.6 (a) c = N/T =0 . Spectral Density when c = 0.3 b = 0, M-P Law (b) c = N/T =0 . Fig. 3: Spectral Density of ρ T ( b ) and g MP E. Metric Space Designing

To design a metric space for solving the mentioned matchproblem, we need to assign a set and then deﬁne a distancefunction (metric) on it. What we have in practice are theobserved data Z ob in the form of multiple time-series, and SEoutputs ˆ Z m derived from Z ob and Model M m . The difference(Eq. 1: X m = Z ob − ˆ Z m ) is the ﬁrst and most obvious choicefor us to extract some statistical information from.Before designing the metric space, let us look throughthose conventional statistics indexes, e.g., ﬁrst/second moment(mean/variance). We have already argued that the proﬁleof renewables-derived uncertainties does follow AR models.Mean and variance contain enough statistical information foran i.i.d. Gaussian random variable, but insufﬁcient for an ARmodel, not to mention multiple AR processes (temporal aspect)on those connected distributed access points (spatial aspect).Some more powerful tools are needed to map the difference X m , which consists of a large number of random variables,into some indicator within a well designed metric space. Theproposed hybrid framework (Fig. 1) conducts jointly temporal-spatial analysis of X m as follows: First, X is converted into R with a given p (Eq. 7), and then the ESD ρ E is calculated(Eq. 9). On the other hand, with a given coefﬁcient b , thetheoretical spectral density ρ T ( b ) is capable of being computed(Eq. 8 → → → the metric distancesuch as Jensen-Shannon divergence can be studied: d ( Z ob , ˆ Z m ) = | X m | D = D ( ρ T ( b ) , ρ E ( p )) = (cid:88) i p i D JS ( a i , b i ) (14)where D JS ( a, b ) = a log a + b log b − v log v with v = a + b .With the metric space design, the TI task is converted intoa convex optimization problemarg min m d ( Z ob , ˆ Z m ) = arg min m D ( ρ T ( b ) , ρ E ( p )) . (15)The convex optimization can be readily calculated usingmodern software toolbox such as CVX.V. C ASE S TUDIES

A. Case Background and Model Bank

IEEE 33-bus Network (Fig. 2) is used to validate ourproposed hybrid framework. Considering a sampling datasetwith 1440 observations (4 hours with a 0.1 Hz sampling rate).This observation leads to the empirical dataset Z ob , whichconsists of local sample data from 33 access points. FollowingSec. II-A, it is assumed that there is no prior information aboutthe power ﬂow on the connected branch ( P i,j and Q i,j ). Fig 4a depicts the active power generation/consumption ateach node ( P i ∈ P ob ⊂ Z ob ). For Node 20 and Node 31, thecurves are of high variation derived from the behavior of some wind speed data in practice . For other nodes, however, thecurves are stationary since the proﬁle of routine power usagesis relatively smooth. It is noteworthy that we only discuss therandomness component as mentioned in Sec. II.

11 12

Sample Time (*120 points) -0.500.5 P o w e r D e m and ( M W ) Load ||Ver= P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P (a) P real : Load Behavior in IEEE 33-bus Network Sample Time ( * 120 points ) V o l t age ( p . u . ) P P P P Voltage ||Ver= P P P P P P P P P P P P P P P P P P P P P P P P P P P P P s (b) ˆ V : Voltage Magnitude of Model 1 Fig. 4: Dataset from 33 Points and 1440 ObservationsThe physical grid only has numerous possible operationmodels (Sec. III-C), and we arrange them to form the modelbank (Fig 5). Through parallel PF analysis, we test each model,e.g. Model M m , and work out its SE result ˆ Z m . Fig 4b depictsthe voltage magnitudes of Model M ( ˆ V ⊂ ˆ Z ).The low-dimensional statistics Mean µ and Variation σ con-tain enough statistical information about Gaussian variables,but not about the renewables-derived randomness ˆ V , of whichmultiple AR time-series contribute a major part. Moreover,Mean µ is vulnerable to ﬁxed measurement error . Toaddress those renewables-derived uncertainties is the primarymotivation for our proposed framework. B. Case Designing

We assume that at time point t = 720 , due to some reasonthere is an operation model transformation from Model M to M —the system operates under M during ∼ , andM during ∼ . We also take the measurement errorinto account, and regard it as a Gaussian random variable E ,whose statistical properties can be fully described by mean µ E and standard deviation σ E .

11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414 1515 1616 1717 18181919

11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414 1515 1616 1717 18181919 2020 2121 2222 M M M

411 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414 1515 1616 1717 1818 M . . . Fig. 5: Models Stored in the Model BankOur previous work [32] has already shown that the ﬁxedmeasurement error µ E has no inﬂuence to the RMT-based analysis and indicator at all. Therefore we only need toconsider σ E . Referring to [33], it is supposed that σ E = 0 . p.u.—the standard deviation of the measurement errors is0.5%. The uncertainties caused by renewables and measure-ment errors together may signiﬁcantly inﬂuence the statisticalproperties of observed data, thereby disabling TI performance.When both Z ob , the observed data, and ˆ Z m , the SE outputof Model M m , are known a priori, so is their difference X m .As the reasons given in our previous work [20], only voltagemagnitude V m ⊂ X m is discussed. Furthermore, we keep eachobservation duration 720 sampling points and thus divide thewhole observation into 5 periods: T ( ∼ ), T ( ∼ ),T ( ∼ ), T ( ∼ ), and T ( ∼ ). Weuse V m (: , T j ) to represent the voltage difference on all the 33nodes during T j , which can be denoted as V m j when there isno ambiguity. Fig. 6 shows the voltage magnitude differencein each period for Model M : V , V , · · · , V . C. Regression-based TI and its Failure when Uncertainties arenot Well Addressed

We test the TI performance by employing Jacobian matrix J (Eq. 5), a matrix variate which is strongly associated withnetwork topology Y . From Eq. (4), the estimation of J canbe naturally formulated as a regression problem . Under fairlygeneral conditions, the target J , according to Eq. (3), keepsnearly constant within some duration, called ∆ t , due to thestability of the system, or concretely, of variables V, θ, Y .During ∆ t , considering T times observation at time instants T T T T T /T T T T T Observed Order -0.04-0.03-0.02-0.0100.010.020.030.04 V o l t age ( p . u . ) v v v V : Voltage Magnitude of X The proposed framework doesn'trely on Change Point Dectection

Fig. 6: V : Voltage Magnitude Component of Difference X t i , ( i = 1 , , · · · , T, t T − t = ∆ t ) , we acquire the operationdata points in the form of ( x ( i ) , y ( i ) ) .In this case, we take the period T ( ∼ ) for study.The truth-value of J on each sampling point is calculatedvia Eq. (3) in a model-based way. The result validates that J indeed keeps nearly constant at around its mean J Mean (Fig. 7a, 20 level), and with the standard deviation J SD (Fig. 7b, 0.04 level). Therefore, it is reasonable to set J Mean as the benchmark during this observation period T . JMean -14.5 5.3 -22.5 5.3-5.1 -8.8-7-6.6 -8.2 6.3-8.37.27.2-115.3 -5.9 5.3 -6-5.5 6.3 -10.8-5.323.5 -5.6 -13.8 5.19.17.28.2 -6.3 -6.68.2-7.2-7.210.9-5.6 6.2 5.1 -5.7-6.3 10.9 -5.55.4

10 20 30 40 50 60102030405060 -20-15-10-5051015 (a) Mean: J Mean JS D

10 20 30 40 50 60102030405060 .04 (b) Standard Deviation : J SD Fig. 7: Basic Statistical Information of J in Period T Deﬁning ∆ x ( k ) (cid:44) x ( k +1) − x ( k ) and ∆ y ( k ) (cid:44) y ( k +1) − y ( k ) ,Eq. (4) is rewritten as ∆ y ( k ) ≈ J ( k ) ∆ x ( k ) . Since J keepsnearly constant during T , the expression is reformulated as B ≈ JA (16)where J ∈ R K × K , B = (cid:2) ∆ y (1) , · · · , ∆ y ( T ) (cid:3) ∈ R K × T , and A = (cid:2) ∆ x (1) , · · · , ∆ x ( T ) (cid:3) ∈ R K × T .The least square method is the ﬁrst and most obviouschoice as the solution to the regression problem formulatedas Eq. (16). It is capable of handling the scenarios where thenetwork topologies Y are unreliable or even totally unavail-able , and thus, Y are no longer essential information. Thisproperty agrees with our assumption in Sec. II-A. Conversely,the result of J estimation inherently contains the most up-to-date information about Y .In particular, ordinary least square (OLS) and total leastsquare (TLS) [34] are tested, and numerous scenarios withdifferent types of noise are studied. Fig. 8 shows the results.1. Fig. 8a and 8e tell that both OLS and TLS perform well ( J Err is at the same order as J SD ; J Err —difference between the estimated values and the benchmark J Mean ) when thereis no error on neither y side ( B in Eq 16) nor x side ( A ).2. Fig. 8b and 8f tell that their performances reduce fromgood level to acceptable level when some Gaussian error(5%) injects into y ( x is assumed to be error free ).3. Fig. 8c and 8g tell that when the Gaussian error (5%)comes from both y and x , TLS becomes the onlyoption to reach a barely-passing result . TLS is a type oferror-in-variables regression, a least squares data modelingtechnique in which observational error on both dependentand independent variables is taken into account [35].4. However, if the noise does not follow i.i.d. Gaussiandistribution , as the aforementioned renewables-deriveduncertainties, both OLS and TLS fail in this kind ofregression task. These uncertainties, which are analyticallyintractable under conventional framework, will almostcertainly lead to bad results without a proper treatment ,as illustrated in Fig. 8d and 8h. This is the primarymotivation for our proposed hybrid framework. D. Elementary RMT-based Analysis

To make these renewables-derived uncertainties analyticallytractable, we have to study the problem in a high-dimensionalspace. Under the RMT framework provided in our previouswork [20], we gain insight the uncertainties from the spectrumaspect via high-dimensional analysis.Fig. 9 depicts the analysis result for Model M m in PeriodT j . The ‘ g T ’ Curve is the theoretical M-P Law spectral densityas given in Eq. (13). The ‘Hist’ Curve means histogram forthe ESD. First, we set factor numbers p in Eq. (7) to convertdifference V into residues R . Then we calculated the ESD of C m j = T RR T according to Eq. (9). The ‘ ρ E ’ Curve is theprobability density estimate of the ‘Hist’ Curve using KernelSmoothing Function (code ‘ksdensity( · )’ in Matlab, for ModelM ) or Moving Average Function (code ‘smooth( · )’, for M ).The metric space designed in Sec. IV-E enables us to quan-tify the TI performance of each bank model in spectrum space.The outliers tend to big and evident as the correspondingmodel becomes deviant, and the deviation will lead to a large d ( V ob , ˆ V m j ) = | V m j | D as deﬁned in Eq. (14). E. FA Analysis and Time-Series Analysis

For each difference-derived random matrix, e.g. V , wecalculate its ESD with a different factor numbers p , and thenobtain the results as shown in Fig. 9d. As we increase factornumbers p , the outliers are alleviated. This phenomenon agreeswith the fact that FA is often used for dimension reduction insampling data with underlying constructs, i.e. converting V m j into L ( p ) F ( p ) following Eq. (7). However, the residues part R m j could also have some latent construct . For instance,the randomness caused by a wind following AR model withcoefﬁcients b . This statistic property cannot be eliminated simply by increasing p. Fortunately, Ref. [27] applies RMTto derive spectral density of large sample covariance matricesgenerated by multivariate ARMA processes in analytic forms (Eq. 8 → → → R m j . JErr : OLS without Errors

10 20 30 40 50 60102030405060 .07 (a)

OLS : without Error (0.07 level)

JErr : OLS with Gaussian Error from y

10 20 30 40 50 60102030405060 .3 (b) Gaussian Error from y (0.3 level) JErr : OLS with Gaussian Error from Both y and x

10 20 30 40 50 60102030405060 (c) Error from both y and x (-) JErr : OLS with NonGaussian Error from Both y and x

10 20 30 40 50 60102030405060 (d) Non-Gaussian Error from both (-) JErr : TLS without Errors

10 20 30 40 50 60102030405060 .07 (e)

TLS : without Error (0.07 level)

JErr : TLS with Gaussian Error from y

10 20 30 40 50 60102030405060 .25 (f) Gaussian Error from y (0.25 level) JErr : TLS with Gaussian Error from Both y and x

10 20 30 40 50 60102030405060 (g) Error from both y and x (4 level) JErr : TLS with NonGaussian Error from Both y and x

10 20 30 40 50 60102030405060 (h) Non-Gaussian Error from both (-) Fig. 8: Performance of OLS and TLS on J Estimation with Different Types of Noise

Spectral Density when c = 0.3 for Model 1 in Window T (a) ESD of C : Model 1 in T Spectral Density when c = 0.3 for Model 1 in Window T (b) ESD of C : Model 1 in T outliers Spectral Density when c = 0.3 for Model 1 in Window T (c) ESD of C : Model 1 in T ESD of C for Different p p = 4 p = 2 p = 0 (d) ESD of C with Different p Fig. 9: ESD of C m j for Model m in Period T j Temporal analysis is conducted ﬁrst by estimating the auto-correlation coefﬁcient b of R m j using Burg’s method (code‘arburg( · )’ in Matlab). If the picked model perfectly matchesthe real grid, the renewables-derived auto-correlation would beeliminated, and only (Gaussian) measurement error remains.Fig. 10 validates this—all the node on V (Column C ) and V (C ) are of small auto-correlation ( ˆ b ≈ ), and thereforewe should accept the hypothesis that Model M matches thereal system in Period T , and M in Period T . Auto-correlation M M M M Model & Observation N ode N o . -0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.10 [X,Y] [11 21] Index -0.8697 [R,G,B] [0 0 0.9686] [X,Y] [11 29]

Index -0.8813 [R,G,B] [0 0 0.9216] [X,Y] [11 30]

Index -0.8791 [R,G,B] [0 0 0.9216]C C C Fig. 10: Estimated auto-correlation coefﬁcient ˆ b of V m j Fig. 9a also depicts the phenomenon that only measurementerror remains— V -derived ESD does closely match thetheoretical ‘ g T ’ Curve (M-P Law) and no obvious outliersexist. Besides, we can ﬁnd that the values of the nodes closeto the reference bus (e.g. Node 2, 3, 19) are usually stablearound 0. The phenomenon that these nodes are insusceptibleto renewables is consistent with our common sense. F. Jointly Temporal-spatial Analysis with Latent Structure

M-P Law can nicely model R in some sense. Then someopen questions are raised, for example: 1) How to model othercolumns, e.g., Column 11 ( R )? 2) Can we extract someinformation from them, and how? To address these questions,jointly temporal-spatial analysis is discussed. We revisit our prior information to ﬁnd out the causes whichmay decide/inﬂuence the statistical properties of R m j . Onemajor cause is the two independent renewables on Node 20and Node 31. From the local ﬁeld data we know that theirpower outputs follow AR process with some latent structure.Another major cause is the inherent topology Y , although it isunknown and may have a transformation at some time point.Then we conduct the analysis with the data from a fewnodes but not all of them. This is practical when the advancedsensors such as µ PMUs are only deployed on some importantbuses. RMT-framework inherently supports statistical analy-sis with data only from a subset of nodes —the data matrixcan be naturally divided into data blocks without additionalerror , but this is not true for mechanism models. Our previouswork [32] gives a discussion on this RMT-framework property.For Column 11 ( R ), we take the renewables-inﬂuencednodes’ data ( b ≈ . ) into account, and then make a jointlytemporal-space analysis following Sec. IV. The coefﬁcients ˆ b of these inﬂuenced nodes are similar. With the prior knowledgeof Model M stored in the bank, we divide these inﬂuencednodes into three parts: 1) Node 6, 7; 2) Node 20 ∼

22; and3) Node 29 ∼

33. Then we study their cross-correlation underthis division—the closely connected nodes must show strongcorrelation , while the separated nodes show the independence.Based on this property, we use the theoretical spectral density ρ T ( b ) to test them, and the results are given in Fig. 11. Connected

Node s Separate

Nodes

Fig. 11: Jointly Spatial-temporal Analysis to Grid NodesThe ESD of relevant data derived from separated nodes closely matches the theoretical density ρ T (0 . . This phe-nomenon is built upon the premise of the Assumption I inSec. IV-D, i.e. ˆ R has sufﬁciently negligible cross-correlation:cross-covariances matrix A N ≈ I N × N —the randomness com-ponent of these separated nodes are inﬂuenced by renewableswith independent behaviors . This independence is oftenreasonable especially for an integrated energy system (IES)with diverse sources . While for those closely connected nodes(Node 20 ∼

22 in this case), the independence condition isviolated , so there is no consistency between ρ E and ρ T (0 . . G. Test with IEEE 85-bus Network

In addition, we test our framework using IEEE 85-bus radialdistribution systems. The sampling sensors, renewable gener-ators with diverse/similar patterns are deployed as Fig. 12.

Sampling Sensors

Renewable Generators with diverse patterns (AR b =0.9)Renewable Generators with similar patterns (AR b =0.7) Fig. 12: IEEE 85-bus radial distribution systemsAs a distribution grid usually operates in open loop, we justtest the pair switch of the normally closed branches and thenormally open branches. In particular, we test the pair switchof the normally closed branches B − (closed → open) com-panied with the normally open branch B − (open → closed)in Model M , and with B − (open → closed) in M , respec-tively. The error of branch impedance is also tested—M tests B − , B − , B − , B − , and B − . Similar toFig. 10 and 11, Fig. 13a shows the time-series information, andFig. 13b shows the jointly spatial-temporal analysis results.The results in Fig. 13 validate the hybrid frameworkagain. This framework is suitable to the scenarios when therenewables-behavior dominates our observed data . For thenodes inﬂuenced by multiple sources, such as Node ∼ , however, it is hard to model them in practice. Under anideal scenario, independent component analysis (ICA) or freecomponent analysis (FCA) [36] may be applied to separate themixed signal into additive (independent)subcomponents. Thecombination of ICA/FCA and our framework offers potentialfor a more complex scenario.VI. C ONCLUSION

This paper explores several high-dimensional analytics inthe context of topology identiﬁcation. We propose a hybridframework, by tying AR model, FA, and RMT together, tohandle the renewables-derived uncertainties in the form ofmultiple time-series. Our framework, through a systematicand theoretical processing , makes these uncertainties analyt-ically tractable, and is immune to ﬁxed measurement error .Several future studies are in order. Clearly, further researchis needed to employ the more general residue modeling , forwhich we can calculate the spectral density readily. For ex-ample, as described in [27], if considering vector ARMA(1,1) Autocorration Coeffienct M M M M [4 12] Index -0.8903 [R,G,B] [0 0 0.6549] [X,Y] [4 19]

Index -0.9227 [R,G,B] [0 0 0.5137] [X,Y] [4 64]

Index -0.8919 [R,G,B] [0 0 0.6392] [X,Y] [4 71]

Index -0.8913 [R,G,B] [0 0 0.6549] [X,Y] [4 47]

Index -0.705 [R,G,B] [0 0.4039 1] [X,Y] [4 52]

Index -0.7015 [R,G,B] [0 0.4196 1] [X,Y] [4 32]

Index -0.6407 [R,G,B] [0 0.6706 1] b NodeNo.Model (a) Estimated auto-correlation coefﬁcient ˆ b of V m Sources with i.i.d Diverse PatternsSources with Similar Patterns (b) Jointly Spatial-temporal Analysis

Fig. 13: Result of IEEE 85-bus Radial Distribution Systemsprocesses, we have up to 6th-order polynomial equations. Ob-viously, compared to i.i.d. Gaussian noise, the joint temporalmodel (AR) and spatial model (FA) oftentimes provide more ﬂexible and rigours models and analyses on renewables-derived uncertainties. Besides, the framework is capable tohandle comprehensive behavior (on the nodes inﬂuenced bymultiple sources) with the help of existing algorithm such asICA. The combination of conventional tasks in power systemwith novel tools in data science is a long-term goal in ourcommunity, especially in big data era. In addition, this hybridframework can be extended to an integrated energy system, inwhich randomness and independence is more evident.R

EFERENCES [1] F. F. Wu and W. H. E. Liu, “Detection of topology errorsby state estimation [power systems],”

IEEE Transactionson Power Systems , vol. 4, no. 2, pp. 50–51, 1989.[2] S. Bolognani, N. Bof, D. Michelotti, R. Muraro, andL. Schenato, “Identiﬁcation of power distribution net-work topology via voltage correlation analysis,” in . IEEE, 2013,pp. 1659–1664.[3] G. Cavraro and V. Kekatos, “Graph algorithms fortopology identiﬁcation using power grid probing,”

IEEEcontrol systems letters , vol. 2, no. 4, pp. 689–694, 2018.[4] D. Deka, S. Backhaus, and M. Chertkov, “Structure learn-ing in power distribution networks,”

IEEE Transactionson Control of Network Systems , vol. 5, no. 3, pp. 1061–1074, 2017.[5] O. Ardakanian, V. W. Wong, R. Dobbe, S. H. Low, A. vonMeier, C. J. Tomlin, and Y. Yuan, “On identiﬁcationof distribution grids,”

IEEE Transactions on Control ofNetwork Systems , vol. 6, no. 3, pp. 950–960, 2019.[6] J. Yu, Y. Weng, and R. Rajagopal, “Patopa: A data-drivenparameter and topology joint estimation framework indistribution grids,”

IEEE Transactions on Power Systems ,vol. 33, no. 4, pp. 4335–4347, 2017.[7] Y. Yuan, O. Ardakanian, S. Low, and C. Tomlin,“On the inverse power ﬂow problem,” arXiv preprintarXiv:1610.06631 , 2016.[8] X. He, L. Chu, R. C. Qiu, Q. Ai, Z. Ling, and J. Zhang,“Invisible units detection and estimation based on ran-dom matrix theory,”

IEEE Transactions on Power Sys-tems , vol. 35, no. 3, pp. 1846–1855, 2019.[9] B. Yang, T. Yu, H. Shu, J. Dong, and L. Jiang, “Robustsliding-mode control of wind energy conversion systemsfor optimal power extraction via nonlinear perturbationobservers,”

Applied Energy , vol. 210, pp. 711–723, 2018.[10] R. Singh, E. Manitsas, B. C. Pal, and G. Strbac, “Arecursive bayesian approach for identiﬁcation of networkconﬁguration changes in distribution system state esti-mation,”

IEEE Transactions on Power Systems , vol. 25,no. 3, pp. 1329–1336, 2010.[11] G. Cavraro and R. Arghandeh, “Power distribution net-work topology detection with time-series signature veri-ﬁcation method,”

IEEE Transactions on Power Systems ,vol. 33, no. 4, pp. 3500–3509, 2017.[12] Z. Tian, W. Wu, and B. Zhang, “A mixed integerquadratic programming model for topology identiﬁcationin distribution network,”

IEEE Transactions on PowerSystems , vol. 31, no. 1, pp. 823–824, 2015.[13] R. Qiu and P. Antonik,

Smart Grid and Big Data . JohnWiley and Sons, 2015.[14] Y. C. Chen, J. Wang, A. D. Dom´ınguez-Garc´ıa, and P. W.Sauer, “Measurement-based estimation of the power ﬂowjacobian matrix,”

IEEE Transactions on Smart Grid ,vol. 7, no. 5, pp. 2507–2515, Sept 2016.[15] J. Yeo and G. Papanicolaou, “Random matrix approachto estimation of high-dimensional factor models,” arXivpreprint arXiv:1611.05571 , 2016.[16] X. Ding and F. Yang, “Spiked separable covariancematrices and principal components,” arXiv preprintarXiv:1905.13060 , 2019.[17] M. He and J. Zhang, “A dependency graph approachfor fault detection and localization towards secure smartgrid,”

IEEE Transactions on Smart Grid , vol. 2, no. 2,pp. 342–351, 2011.[18] Y. Sharon, A. M. Annaswamy, A. L. Motto, and A. Chakraborty, “Topology identiﬁcation in distributionnetwork with limited measurements,” in . IEEE,2012, pp. 1–6.[19] D. C. Montgomery, C. L. Jennings, and M. Kulahci,“Introduction to time series analysis and forecasting,”2008.[20] X. He, Q. Ai, R. C. Qiu, W. Huang, L. Piao, and H. Liu,“A big data architecture design for smart grids basedon random matrix theory,”

IEEE Transactions on SmartGrid , vol. 8, no. 2, pp. 674–686, 2017.[21] P. Bacher, H. Madsen, and H. A. Nielsen, “Online short-term solar power forecasting,”

Solar Energy , vol. 83,no. 10, pp. 1772–1783, 2009.[22] D. D. Suhr, “Principal component analysis vs. ex-ploratory factor analysis (paper 203-30),” in

Proceedingsof the thirtieth annual SAS R (cid:13) users group internationalconference , vol. 203, 2005, p. 30.[23] J. Fan, Y. Liao, and M. Mincheva, “High dimensionalcovariance matrix estimation in approximate factor mod-els,” Annals of statistics , vol. 39, no. 6, p. 3320, 2011.[24] I. I. Dimov, P. N. Kolm, L. Maclin, and D. Y. Shiber,“Hidden noise structure and random matrix models ofstock correlations,”

Quantitative Finance , vol. 12, no. 4,pp. 567–572, 2012.[25] M. Pelger, “Large-dimensional factor modeling based onhigh-frequency observations,”

Journal of econometrics ,vol. 208, no. 1, pp. 23–42, 2019.[26] X. Shi, R. Qiu, Z. Ling, F. Yang, H. Yang, and X. He,“Spatio-temporal correlation analysis of online monitor-ing data for anomaly detection and location in distri-bution networks,”

IEEE Transactions on Smart Grid ,vol. 11, no. 2, pp. 995–1006, 2019.[27] Z. Burda, A. Jarosz, M. A. Nowak, and M. Snarska,“A random matrix approach to varma processes,”

NewJournal of Physics , vol. 12, no. 7, p. 075036, 2010.[28] A. Gomez-Exposito, A. J. Conejo, and C. Canizares,

Electric energy systems: analysis and operation . CRCpress, 2018.[29] T. Rogers, “New results on the spectral density of randommatrices,” Ph.D. dissertation, King’s College London,2010.[30] L. Zhang, “Spectral analysis of large dimensional randommatrices,”

National University of Singapore PHD Thesis ,2006.[31] V. A. Marˇcenko and L. A. Pastur, “Distribution ofeigenvalues for some sets of random matrices,”

Sbornik:Mathematics , vol. 1, no. 4, pp. 457–483, 1967.[32] X. He, R. C. Qiu, Q. Ai, L. Chu, X. Xu, and Z. Ling,“Designing for situation awareness of future power grids:An indicator system based on linear eigenvalue statisticsof large random matrices,”

IEEE Access , vol. 4, pp.3557–3568, 2016.[33] X. Ma, W. Li, G. Yu, R. Cao, Q. Zhang, and X. Zhang,“Development of high voltage ac energy meter,” in .IEEE, 2012, pp. 132–133.[34] F. Passerini and A. M. Tonello, “Power line network topology identiﬁcation using admittance measurementsand total least squares estimation,” in

ICC 2017 -2017 IEEE International Conference on Communica-tions , 2017.[35] Wikipedia, “Total least squares,” 2018. [Online]. Avail-able: https://en.wikipedia.org/wiki/Total least squares[36] H. Wu and R. R. Nadakuditi, “Free component analy-sis: Theory, algorithms & applications,” arXiv preprintarXiv:1905.01713arXiv preprintarXiv:1905.01713