Fast Power system security analysis with Guided Dropout
Benjamin Donnot, Isabelle Guyon, Marc Schoenauer, Antoine Marot, Patrick Panciatici
FFast Power system security analysiswith Guided Dropout
Benjamin Donnot ‡ †∗ , Isabelle Guyon ‡ • , Marc Schoenauer ‡ , Antoine Marot † , Patrick Panciatici † ‡ UPSud and Inria TAU, Université Paris-Saclay, France. • ChaLearn, Berkeley, California. † RTE France.
Abstract . We propose a new method to efficiently compute load-flows(the steady-state of the power-grid for given productions, consumptionsand grid topology), substituting conventional simulators based on differ-ential equation solvers. We use a deep feed-forward neural network trainedwith load-flows precomputed by simulation. Our architecture permits totrain a network on so-called “n-1” problems, in which load flows are evalu-ated for every possible line disconnection, then generalize to “n-2” problemswithout re-training (a clear advantage because of the combinatorial natureof the problem). To that end, we developed a technique bearing similaritywith “dropout”, which we named “guided dropout”.
Electricity is a commodity that consumers take for granted and, while govern-ments relaying public opinion (rightfully) request that renewable energies beused increasingly, little is known about what this entails behind the scenes inadditional complexity for Transmission Service Operators (TSOs) to operate thepower transmission grid in security. Indeed, renewable energies such as wind andsolar power are less predictable than conventional power sources (mainly thermalpower plants). A power grid is considered to be operated in “security” (i.e. ina secure state) if it is outside a zone of “constraints”, which includes that powerflowing in every line does not exceed given limits. To that end, it is standardpractice to operate the grid in real time with the so-called “n-1” criterion: thisis a preventive measure requiring that at all times the grid would remain ina safe state even if one component (generators, lines, transformers, etc.) weredisconnected. Today, the complex task of dispatchers, which are highly trainedengineers, consists in analyzing situations and checking prospectively their effectusing sophisticated (but slow) high-end simulators. As part of a larger projectto assist TSOs in their daily operations [3], our goal in this paper is to emulatethe power grid with a neural network to provide fast estimations of power flowsin all lines given some “injections” (electricity productions and consumptions).
Due to the combinatorial nature of changes in power grid topology, it is imprac-tical (and slow) to train one neural network for each topology. Our idea is to ∗ Benjamin Donnot corresponding authors: [email protected] a r X i v : . [ s t a t . M L ] J a n rain a single network with architecture variants to capture all elementary gridtopology variants (occurring either by willful or accidental line disconnections)around a reference topology for which all lines are in service. We train simultane-ously on samples obtained with the reference topology and elementary topologychanges, limited to one line disconnection (“n-1” cases), which are encoded in theneural network by activating “conditional” hidden units. Regular generalizationis evaluated by testing the neural network with additional {injections, powerflows} input/output pairs for “n-1” cases. We also evaluate super-generalization for “n-2” cases, in which a pair of lines is disconnected, by activating simultane-ously the corresponding conditional hidden units (though the network was nevertrained on such cases).While our work was inspired by “dropout” [6] and relates to other effortsin the literature to learn to “sparsify”[2] to increase network capacity withoutincreasing computational time (used in automatic translation [5]) and to “mixedmodels” (used e.g. for person identification [7] or source identification [4]), webelieve that our idea to encode topological changes in the grid in the networkarchitecture is novel and so is the type of application that we address. We compared the performance of our proposed
Guided Dropout (GD) methodwith multiple baselines (Figure 1):
One Model : One neural network is trained for each grid topology.
One Var (OV) : One single input variable encodes which line is disconnected( for no line disconnected, for line 1 is disconnected, for line 2, etc.) One Hot (OH) : If n is the number of lines in the power grid, n extra binaryinput variables are added, each one coding for connection/disconnection. DC approximation : A standard baseline in power systems. This is anapproximation of the AC (Alternative Current) non-linear powerflow equations.
One Model is a brute force approach that does not scale well with the size ofthe power grid. A power grid with n lines would require training n ( n − / neuralnetworks to implement all “n-2” cases. One Variable is our simplest encodingallowing us to train one network for all “n-1” cases. However, it does not allowus to generalize to “n-2” cases.
One Hot is a reference architecture, which allowsus to generalize to “n-2” cases. The
DC approximation of power flows neglectsreactive power and permits to compute relatively fast an approximation of powerflows using a matrix inversion, given a detailed physical model of the grid. Seeour supplemental material for all details on neural network architectures and DCcalculations. To conduct a fair comparison towards this DC baseline, we use nomore input variables than active power injections. However we should performeven better if we were using additional variables such as voltages as used in ACpower flow, not considering them constant as in this DC approximation . A supplemental material will be available at https://hal.archives-ouvertes.fr/hal-01649938v2 or in the github repository FPSSA-GuidedDropout.
Neural network architectures being compared.
We overlay alltypes of architectures under consideration. Lines denoting trainable parametersare not present in all models, as indicated in the legend. Biases not repre-sented. The injection inputs are split into two submodules for productions andconsumptions.
Our goal in this section is to demonstrate empirically that multiple grid topolo-gies can be modeled with a single neural network with the purpose of provid-ing fast current flow predictions (measured in Amps), for given injectionsand given topologies, to anticipate whether some lines might exceed their ther-mal limit should a contingency occur. We show a phenomenon that we call super-generalization : with the “guided dropout” topology encoding, a neuralnetwork, trained with “n-1” topology cases only, generalizes to “n-2” cases. Wedemonstrate that our approach would be computationally viable on a grid aslarge as the French Extra High Voltage power grid.We first conducted systematic experiments on small size benchmark gridsfrom Matpower [8], a library commonly used to test power system algorithms[1]. We report results on the largest case studied: a 118-node grid with n = 186 lines. We used all 187 variants of grid topologies with zero or one disconnectedline ( “n-1” dataset ) and randomly sampled 200 cases of pairs of disconnectedlines ( “n-2” dataset ), out of ∗ / . Training and test data were obtainedby generating for each topology considered 10,000 input vectors (including activeand reactive injections). To generate semi-realistic data, we used our knowledgeof the French gri, to mimic the spatio-temporal behavior of real data [3]. Forexample, we enforced spatial correlations of productions and consumptions andmimicked production fluctuations, which are sometimes disconnected for main-tenance or economical reasons. Target values were then obtained by computing3
50 100 150 200 250 300Epoch2.01.51.00.50.0 L e rr o r ( l o g . s c a l e ) L2 error on n-1 dataset (training on n-1)
DC approxOne ModelOne VarOne hot*G. Dropout (a)
Regular generalization. L e rr o r ( l o g . s c a l e ) L2 error on n-2 dataset (training on n-1)
DC approx.One hot*G. Dropout (b)
Super-generalization.
Fig. 2:
Mini grid of 118 buses (nodes).
We show the L2 error in Amperes ona log scale as a function of training epochs. The neural network in both cases is trained for all “n-1” cases with multiple examples of injections. (a)
Regulargeneralization.
Test set made of (all) test injections for “n-1” cases . (b)
Super-generalization.
Test set made of a subset of test injections for “n-2”cases . Error bars are 25-75% quantiles over 10 runs having converged.resulting flows in all lines with the AC power flow simulator Hades2, . Thisresulted in a “n-1” dataset of , , samples (we include in the “n-1” datasetsamples for the reference topology) and a “n-2” dataset of , , samples. Weused of the “n-1” dataset for training, for hyper-parameter selection,and for testing. All “n-2” data were used solely for testing.In all experiments, input and output variables were standardized. We opti-mized the “L2 error” (mean-square error) using the Adam optimizer of Tensor-flow.Figure 2 shows generalization and super-generalization learning curves forthe various methods. For reasons given above, super-generalization can only beachieved by One Hot and Guided Dropout, which explains that Figure 2-b hasonly two curves. The DC approximation is represented as a horizontal dashedline (since it does not involve any training). The test error is represented in alog scale. Hence, it can be seen in Figure 2-a that neural networks are very pow-erful at making load flow predictions since the “One Model” approach (yellowcurve) outperforms the DC approximation by an order of magnitude. Howeverthe “One Model” approach is impractical for larger grid sizes. The “One Hot”approach is significantly worse than both “Guided Dropout” and “DC approx-imation”. Of all neural network approaches, “Guided Dropout” gives the bestresults, and it beats the DC approximation for “n-2” cases (super-generalization).The super-generalization capabilities of neural networks trained with “GuidedDropout” are obtained by combining in a single network “shared” units trained A freeware version of Hades2 is available at We note in the yellow curve some slight over-fitting as evidenced by the test error increaseafter 10 training epochs, which could be alleviated with early stopping. (cid:39) , nodes, (cid:39) , lines) would require computing (cid:39) million power flow simulations, which would take almost half a year, giventhat computing a full AC power flow simulation takes about ms for RTEcurrent production software. Conversely, RTE stores almost all “n-1” cases aspart of “security analyses” conducted every 5 minutes, so the “n-1” dataset isreadily available.To check whether our method scales up to the size of a real power transmis-sion grid, we conducted preliminary experiments using real data of the FrenchExtra High Voltage power grid with over nodes. To that end, we extractedgrid state data from September 13 th th , grid states, lines, generations units and individual loads(accounting for dynamic node splitting, this represented between − power nodes in that period of time). We did not simulate line disconnections inthese preliminary experiments (as we would normally do to apply our method).This is strictly an evaluation of computational performance at run time .We trained a network with the following dimensions (its architecture remainsto be optimized): 400 units in the (first) encoder layer, 2735 conditional unitsin the second (guided dropout) layer, and 400 units in the last (decoder) layer.Training takes of the order of one day.With this architecture, when the data are loaded in the computer RAMmemory, we are able to perform more than load-flows per second, whichwould enables to compute more load-flows in the same amount of time thancurrent AC power flow simulators. Our comparison of various approaches to approximate “load flows" using neuralnetworks has revealed the superiority of “Guided Dropout”. This novel methodwe introduced allows us to train a single neural network to predict power flowsfor variants of grid topology. Specifically, when trained on all variants with onesingle disconnected line (“n-1” scenarios), the network generalizes to variantswith TWO disconnected lines (“n-2” scenarios). Given the combinatorial natureof the problem, this presents significant computational advantages.Our target application is to pre-filter serious grid contingencies such as com-binations of line disconnections that might lead to equipment damage or servicediscontinuity. We empirically demonstrated on standard benchmarks of ACpower flows that our method compares favorably with several reference baselinemethods including the DC approximation, both in terms of predictive accuracy5nd computational time. In daily operations, only “n-1” situations are examinedby RTE because of the computational cost of AC simulations. “Guided Dropout”would allow us to rapidly pre-filter alarming “n-2” situations, and then to furtherinvestigate them with AC simulation. Preliminary computational scaling simu-lations performed on the Extra High Voltage French grid indicate the viabilityof such hybrid approach: A neural network would be (cid:39) times faster thanthe currently deployed AC power flow simulator.Given that our new method is strictly data driven – no knowledge usedof the physics of the system (e.g. reactance, resistance or admittance of lines)or the topology of grid –, the empirical results presented in this paper are quiteencouraging, since the DC approximation DOES make use of such knowledge.This prompted RTE management to commit additional efforts to pursue thisline of research. Further work will include incorporating such prior knowledge.The flip side of using a data driven method (compared to the DC approxima-tion) is the need for massive amounts of training data, representative of actualscenarios or situations, in an ever changing environment, and the loss of explain-ability of the model. The first problem will be addressed by continuously finetuning/adapting our model. To overcome the second one, we are working on thetheoretical foundations of the method. A public version of our code that we willrelease on Github is under preparation.
References [1] O Alsac and B Stott. Optimal load flow with steady-state security.
IEEEtransactions on power apparatus and systems , (3):745–751, 1974.[2] Y. Bengio and et al. Estimating or propagating gradients through stochasticneurons for conditional computation. arXiv:1308.3432 , 2013.[3] B. Donnot and et al. Introducing machine learning for power system opera-tion support. In
IREP Symposium , Espinho, Portugal, August 2017.[4] S. Ewert and M. B. Sandler. Structured dropout for weak label and multi-instance learning and its application to score-informed source separation. In
Proc. ICASSP , pages 2277–2281, 2017.[5] N. Shazeer and et al. Outrageously large neural networks: The sparsely-gatedmixture-of-experts layer. arXiv:1701.06538 , 2017.[6] N. Srivastava and et al. Dropout: a simple way to prevent neural networksfrom overfitting.
JMLR , 15(1):1929–1958, 2014.[7] T. Xiao and et al. Learning deep feature representations with domain guideddropout for person re-identification. In
Proc. CVPR , pages 1249–1258, 2016.[8] R. D. Zimmerman and et al. Matpower.