[PDF] Fast Power system security analysis with Guided Dropout

Abstract

We propose a new method to efficiently compute load-flows (the steady-state of the power-grid for given productions, consumptions and grid topology), substituting conventional simulators based on differential equation solvers. We use a deep feed-forward neural network trained with load-flows precomputed by simulation. Our architecture permits to train a network on so-called "n-1" problems, in which load flows are evaluated for every possible line disconnection, then generalize to "n-2" problems without retraining (a clear advantage because of the combinatorial nature of the problem). To that end, we developed a technique bearing similarity with "dropout", which we named "guided dropout".

Full PDF

FFast Power system security analysiswith Guided Dropout

Benjamin Donnot ‡ †∗ , Isabelle Guyon ‡ • , Marc Schoenauer ‡ , Antoine Marot † , Patrick Panciatici † ‡ UPSud and Inria TAU, Université Paris-Saclay, France. • ChaLearn, Berkeley, California. † RTE France.

Abstract . We propose a new method to eﬃciently compute load-ﬂows(the steady-state of the power-grid for given productions, consumptionsand grid topology), substituting conventional simulators based on diﬀer-ential equation solvers. We use a deep feed-forward neural network trainedwith load-ﬂows precomputed by simulation. Our architecture permits totrain a network on so-called “n-1” problems, in which load ﬂows are evalu-ated for every possible line disconnection, then generalize to “n-2” problemswithout re-training (a clear advantage because of the combinatorial natureof the problem). To that end, we developed a technique bearing similaritywith “dropout”, which we named “guided dropout”.

Electricity is a commodity that consumers take for granted and, while govern-ments relaying public opinion (rightfully) request that renewable energies beused increasingly, little is known about what this entails behind the scenes inadditional complexity for Transmission Service Operators (TSOs) to operate thepower transmission grid in security. Indeed, renewable energies such as wind andsolar power are less predictable than conventional power sources (mainly thermalpower plants). A power grid is considered to be operated in “security” (i.e. ina secure state) if it is outside a zone of “constraints”, which includes that powerﬂowing in every line does not exceed given limits. To that end, it is standardpractice to operate the grid in real time with the so-called “n-1” criterion: thisis a preventive measure requiring that at all times the grid would remain ina safe state even if one component (generators, lines, transformers, etc.) weredisconnected. Today, the complex task of dispatchers, which are highly trainedengineers, consists in analyzing situations and checking prospectively their eﬀectusing sophisticated (but slow) high-end simulators. As part of a larger projectto assist TSOs in their daily operations [3], our goal in this paper is to emulatethe power grid with a neural network to provide fast estimations of power ﬂowsin all lines given some “injections” (electricity productions and consumptions).

Due to the combinatorial nature of changes in power grid topology, it is imprac-tical (and slow) to train one neural network for each topology. Our idea is to ∗ Benjamin Donnot corresponding authors: [email protected] a r X i v : . [ s t a t . M L ] J a n rain a single network with architecture variants to capture all elementary gridtopology variants (occurring either by willful or accidental line disconnections)around a reference topology for which all lines are in service. We train simultane-ously on samples obtained with the reference topology and elementary topologychanges, limited to one line disconnection (“n-1” cases), which are encoded in theneural network by activating “conditional” hidden units. Regular generalizationis evaluated by testing the neural network with additional {injections, powerﬂows} input/output pairs for “n-1” cases. We also evaluate super-generalization for “n-2” cases, in which a pair of lines is disconnected, by activating simultane-ously the corresponding conditional hidden units (though the network was nevertrained on such cases).While our work was inspired by “dropout” [6] and relates to other eﬀortsin the literature to learn to “sparsify”[2] to increase network capacity withoutincreasing computational time (used in automatic translation [5]) and to “mixedmodels” (used e.g. for person identiﬁcation [7] or source identiﬁcation [4]), webelieve that our idea to encode topological changes in the grid in the networkarchitecture is novel and so is the type of application that we address. We compared the performance of our proposed

Guided Dropout (GD) methodwith multiple baselines (Figure 1):

One Model : One neural network is trained for each grid topology.

One Var (OV) : One single input variable encodes which line is disconnected( for no line disconnected, for line 1 is disconnected, for line 2, etc.) One Hot (OH) : If n is the number of lines in the power grid, n extra binaryinput variables are added, each one coding for connection/disconnection. DC approximation : A standard baseline in power systems. This is anapproximation of the AC (Alternative Current) non-linear powerﬂow equations.

One Model is a brute force approach that does not scale well with the size ofthe power grid. A power grid with n lines would require training n ( n − / neuralnetworks to implement all “n-2” cases. One Variable is our simplest encodingallowing us to train one network for all “n-1” cases. However, it does not allowus to generalize to “n-2” cases.

One Hot is a reference architecture, which allowsus to generalize to “n-2” cases. The

DC approximation of power ﬂows neglectsreactive power and permits to compute relatively fast an approximation of powerﬂows using a matrix inversion, given a detailed physical model of the grid. Seeour supplemental material for all details on neural network architectures and DCcalculations. To conduct a fair comparison towards this DC baseline, we use nomore input variables than active power injections. However we should performeven better if we were using additional variables such as voltages as used in ACpower ﬂow, not considering them constant as in this DC approximation . A supplemental material will be available at https://hal.archives-ouvertes.fr/hal-01649938v2 or in the github repository FPSSA-GuidedDropout.

Neural network architectures being compared.

We overlay alltypes of architectures under consideration. Lines denoting trainable parametersare not present in all models, as indicated in the legend. Biases not repre-sented. The injection inputs are split into two submodules for productions andconsumptions.

Our goal in this section is to demonstrate empirically that multiple grid topolo-gies can be modeled with a single neural network with the purpose of provid-ing fast current ﬂow predictions (measured in Amps), for given injectionsand given topologies, to anticipate whether some lines might exceed their ther-mal limit should a contingency occur. We show a phenomenon that we call super-generalization : with the “guided dropout” topology encoding, a neuralnetwork, trained with “n-1” topology cases only, generalizes to “n-2” cases. Wedemonstrate that our approach would be computationally viable on a grid aslarge as the French Extra High Voltage power grid.We ﬁrst conducted systematic experiments on small size benchmark gridsfrom Matpower [8], a library commonly used to test power system algorithms[1]. We report results on the largest case studied: a 118-node grid with n = 186 lines. We used all 187 variants of grid topologies with zero or one disconnectedline ( “n-1” dataset ) and randomly sampled 200 cases of pairs of disconnectedlines ( “n-2” dataset ), out of ∗ / . Training and test data were obtainedby generating for each topology considered 10,000 input vectors (including activeand reactive injections). To generate semi-realistic data, we used our knowledgeof the French gri, to mimic the spatio-temporal behavior of real data [3]. Forexample, we enforced spatial correlations of productions and consumptions andmimicked production ﬂuctuations, which are sometimes disconnected for main-tenance or economical reasons. Target values were then obtained by computing3

50 100 150 200 250 300Epoch2.01.51.00.50.0 L e rr o r ( l o g . s c a l e ) L2 error on n-1 dataset (training on n-1)

DC approxOne ModelOne VarOne hot*G. Dropout (a)

Regular generalization. L e rr o r ( l o g . s c a l e ) L2 error on n-2 dataset (training on n-1)

DC approx.One hot*G. Dropout (b)

Super-generalization.

Fig. 2:

Mini grid of 118 buses (nodes).

We show the L2 error in Amperes ona log scale as a function of training epochs. The neural network in both cases is trained for all “n-1” cases with multiple examples of injections. (a)

Regulargeneralization.

Test set made of (all) test injections for “n-1” cases . (b)

Super-generalization.

Test set made of a subset of test injections for “n-2”cases . Error bars are 25-75% quantiles over 10 runs having converged.resulting ﬂows in all lines with the AC power ﬂow simulator Hades2, . Thisresulted in a “n-1” dataset of , , samples (we include in the “n-1” datasetsamples for the reference topology) and a “n-2” dataset of , , samples. Weused of the “n-1” dataset for training, for hyper-parameter selection,and for testing. All “n-2” data were used solely for testing.In all experiments, input and output variables were standardized. We opti-mized the “L2 error” (mean-square error) using the Adam optimizer of Tensor-ﬂow.Figure 2 shows generalization and super-generalization learning curves forthe various methods. For reasons given above, super-generalization can only beachieved by One Hot and Guided Dropout, which explains that Figure 2-b hasonly two curves. The DC approximation is represented as a horizontal dashedline (since it does not involve any training). The test error is represented in alog scale. Hence, it can be seen in Figure 2-a that neural networks are very pow-erful at making load ﬂow predictions since the “One Model” approach (yellowcurve) outperforms the DC approximation by an order of magnitude. Howeverthe “One Model” approach is impractical for larger grid sizes. The “One Hot”approach is signiﬁcantly worse than both “Guided Dropout” and “DC approx-imation”. Of all neural network approaches, “Guided Dropout” gives the bestresults, and it beats the DC approximation for “n-2” cases (super-generalization).The super-generalization capabilities of neural networks trained with “GuidedDropout” are obtained by combining in a single network “shared” units trained A freeware version of Hades2 is available at We note in the yellow curve some slight over-ﬁtting as evidenced by the test error increaseafter 10 training epochs, which could be alleviated with early stopping. (cid:39) , nodes, (cid:39) , lines) would require computing (cid:39) million power ﬂow simulations, which would take almost half a year, giventhat computing a full AC power ﬂow simulation takes about ms for RTEcurrent production software. Conversely, RTE stores almost all “n-1” cases aspart of “security analyses” conducted every 5 minutes, so the “n-1” dataset isreadily available.To check whether our method scales up to the size of a real power transmis-sion grid, we conducted preliminary experiments using real data of the FrenchExtra High Voltage power grid with over nodes. To that end, we extractedgrid state data from September 13 th th , grid states, lines, generations units and individual loads(accounting for dynamic node splitting, this represented between − power nodes in that period of time). We did not simulate line disconnections inthese preliminary experiments (as we would normally do to apply our method).This is strictly an evaluation of computational performance at run time .We trained a network with the following dimensions (its architecture remainsto be optimized): 400 units in the (ﬁrst) encoder layer, 2735 conditional unitsin the second (guided dropout) layer, and 400 units in the last (decoder) layer.Training takes of the order of one day.With this architecture, when the data are loaded in the computer RAMmemory, we are able to perform more than load-ﬂows per second, whichwould enables to compute more load-ﬂows in the same amount of time thancurrent AC power ﬂow simulators. Our comparison of various approaches to approximate “load ﬂows" using neuralnetworks has revealed the superiority of “Guided Dropout”. This novel methodwe introduced allows us to train a single neural network to predict power ﬂowsfor variants of grid topology. Speciﬁcally, when trained on all variants with onesingle disconnected line (“n-1” scenarios), the network generalizes to variantswith TWO disconnected lines (“n-2” scenarios). Given the combinatorial natureof the problem, this presents signiﬁcant computational advantages.Our target application is to pre-ﬁlter serious grid contingencies such as com-binations of line disconnections that might lead to equipment damage or servicediscontinuity. We empirically demonstrated on standard benchmarks of ACpower ﬂows that our method compares favorably with several reference baselinemethods including the DC approximation, both in terms of predictive accuracy5nd computational time. In daily operations, only “n-1” situations are examinedby RTE because of the computational cost of AC simulations. “Guided Dropout”would allow us to rapidly pre-ﬁlter alarming “n-2” situations, and then to furtherinvestigate them with AC simulation. Preliminary computational scaling simu-lations performed on the Extra High Voltage French grid indicate the viabilityof such hybrid approach: A neural network would be (cid:39) times faster thanthe currently deployed AC power ﬂow simulator.Given that our new method is strictly data driven – no knowledge usedof the physics of the system (e.g. reactance, resistance or admittance of lines)or the topology of grid –, the empirical results presented in this paper are quiteencouraging, since the DC approximation DOES make use of such knowledge.This prompted RTE management to commit additional eﬀorts to pursue thisline of research. Further work will include incorporating such prior knowledge.The ﬂip side of using a data driven method (compared to the DC approxima-tion) is the need for massive amounts of training data, representative of actualscenarios or situations, in an ever changing environment, and the loss of explain-ability of the model. The ﬁrst problem will be addressed by continuously ﬁnetuning/adapting our model. To overcome the second one, we are working on thetheoretical foundations of the method. A public version of our code that we willrelease on Github is under preparation.

References [1] O Alsac and B Stott. Optimal load ﬂow with steady-state security.

IEEEtransactions on power apparatus and systems , (3):745–751, 1974.[2] Y. Bengio and et al. Estimating or propagating gradients through stochasticneurons for conditional computation. arXiv:1308.3432 , 2013.[3] B. Donnot and et al. Introducing machine learning for power system opera-tion support. In

IREP Symposium , Espinho, Portugal, August 2017.[4] S. Ewert and M. B. Sandler. Structured dropout for weak label and multi-instance learning and its application to score-informed source separation. In

Proc. ICASSP , pages 2277–2281, 2017.[5] N. Shazeer and et al. Outrageously large neural networks: The sparsely-gatedmixture-of-experts layer. arXiv:1701.06538 , 2017.[6] N. Srivastava and et al. Dropout: a simple way to prevent neural networksfrom overﬁtting.

JMLR , 15(1):1929–1958, 2014.[7] T. Xiao and et al. Learning deep feature representations with domain guideddropout for person re-identiﬁcation. In

Proc. CVPR , pages 1249–1258, 2016.[8] R. D. Zimmerman and et al. Matpower.