[PDF] Voltage Instability Prediction Using a Deep Recurrent Neural Network

Abstract

This paper develops a new method for voltage instability prediction using a recurrent neural network with long short-term memory. The method is aimed to be used as a supplementary warning system for system operators, capable of assessing whether the current state will cause voltage instability issues several minutes into the future. The proposed method use a long sequence-based network, where both real-time and historic data are used to enhance the classification accuracy. The network is trained and tested on the Nordic32 test system, wherecombinations of different operating conditions and contingency scenarios are generated using time-domain simulations. The method shows that almost all N-1 contingency test cases were predicted correctly, and N-1-1 contingency test cases were predicted with over 93 % accuracy only seconds after a disturbance. Further, the impact of sequence length is examined, showing that the proposed long sequenced-based method provides significantly better classification accuracy than both a feedforward neural network and a network using a shorter sequence.

Full PDF

11 Voltage Instability Prediction Using a DeepRecurrent Neural Network

Hannes Hagmar,

Student Member, IEEE,

Lang Tong,

Fellow, IEEE,

Robert Eriksson,

Senior Member, IEEE, andLe Anh Tuan,

Member, IEEE

Abstract —This paper develops a new method for voltageinstability prediction using a recurrent neural network withlong short-term memory. The method is aimed to be used as asupplementary warning system for system operators, capable ofassessing whether the current state will cause voltage instabilityissues several minutes into the future. The proposed methoduse a long sequence-based network, where both real-time andhistoric data are used to enhance the classiﬁcation accuracy. Thenetwork is trained and tested on the Nordic32 test system, wherecombinations of different operating conditions and contingencyscenarios are generated using time-domain simulations. Themethod shows that almost all N-1 contingency test cases werepredicted correctly, and N-1-1 contingency test cases were pre-dicted with over 93 % accuracy only seconds after a disturbance.Further, the impact of sequence length is examined, showing thatthe proposed long sequenced-based method provides signiﬁcantlybetter classiﬁcation accuracy than both a feedforward neuralnetwork and a network using a shorter sequence.

Index Terms —Dynamic security assessment, long short-termmemory, recurrent neural network, voltage instability prediction,voltage security assessment.

I. I

NTRODUCTION V OLTAGE instability is one of the main limitations forsecure operation of a modern power system [1]. A volt-age instability event can often be deceiving, where the systemmay seem stable for several minutes after a disturbance, onlyending up in an unstable state within a short time period [2].When instability ﬁnally is detected, the system may alreadyhave become severely degraded and the risks of an extendedblackout may have increased signiﬁcantly.To ensure a secure operation, system operators often usean approach called dynamic security assessment (DSA). DSAincludes time-domain analysis to test the power system’sdynamic response after a set of contingencies [3]. Assessmentof the dynamic stability is a complex task, and even with recentprogress in high performance computing, it is generally notfeasible to assess the dynamic stability in real-time [3].To overcome this issue, various machine learning (ML)methods have been proposed in the literature. The mainadvantage of using ML is that high-cost computations canbe performed off-line. Once the ML algorithm is trained, itcan almost instantaneously provide estimations and warningsto operators that otherwise would require time-consuming

The work presented in this paper has been ﬁnancially supported byEnergimyndigheten (Swedish Energy Agency) and Svenska kraftnät (SwedishNational Grid) within the SamspEL program with project number 44358-1.The work of Lang Tong was supported in part by the U.S. National ScienceFoundation under grant 1932501 and 1809830. computations. Examples of DSA methods based on ML arefound in [3]–[8], where mainly various decision tree (DT) orneural network (NN) methods are utilized.Voltage security assessment (VSA) is a branch in DSAthat speciﬁcally examines the impact of voltage instabilityevents. This paper deals with the emergency applicationsof VSA, where the current system state is assessed. Here,the stability of the system is not tested with respect to aset of contingencies; rather, the system may already havesuffered a disturbance. The aim of these methods is to performvoltage instability prediction (VIP), allowing system operatorsto trigger fast remedial actions. The emergency applications ofVSA using ML have been less examined in the literature, butexamples include DT [4], [9], [10] and NN [11] methods.Previously developed methods for VIP have all in commonthat only instantaneous measurements are used as inputs tothe VIP algorithms. These inputs represent the "state signal"that the ML algorithm use to predict the future state. Ideally,the state signal should summarize all relevant informationrequired to determine the future state of the system. A statesignal achieving this is said to have Markov property [12].However, the dynamic response of a power system cannotbe modeled as a ﬁrst order Markov process using only thestatic states provided by available measurements in the powersystem. Rather, the future state of the system also depends ona range of unknown state variables such as the rotor speed ofgenerators, tap positions, or rotor slips of induction motors.In response to these limitations, we propose a new methodbased on a recurrent neural network (RNN) with long short-term memory (LSTM). LSTM networks excel at capturinglong-term dependencies [13], which is an inherent aspect inlong-term voltage stability [2]. The method is, to the authors’knowledge, the ﬁrst of its kind to use current and past datawith the aim to enhance the available state signal and implicitlytake into account unknown state variables.The main contributions of this paper are the following: • A methodology for VIP using an LSTM network isdeveloped. The LSTM network can utilize previous mea-surements, such as the trend of bus voltage magnitudes,tap changes, or fault locations, to improve the accuracyfor VIP. The performance using the sequence-based ap-proach is compared with an LSTM network using shortersequence and a conventional NN. • A new training approach is developed to provide opera-tors with an online assessment tool for potential voltageinstability. As time progresses after a voltage instability a r X i v : . [ ee ss . S Y ] A ug event, the network is capable of incorporating new ob-servations and continuously updating the assessment. • A methodology for including consecutive contingencies( N - - ) into the training data is presented. The paper alsoexamines the ability of the LSTM network to generalizefor VIP under N - - contingencies. Such ability is espe-cially valuable in overcoming the combinatorial increaseof complexity in training.The rest of the paper is organized as follows. In Section II,the theory regarding RNNs and LSTM is presented. In SectionIII, the proposed method is presented along with the steps fordeveloping the training data and the training of the LSTM net-work. In Section IV, the results and discussion are presented.Concluding remarks are presented in Section V.II. L ONG SHORT - TERM MEMORY NETWORKS

Neural networks is a class of machine learning algorithms,highly capable of accurately approximating nonlinear func-tions, mapping a set of inputs to a corresponding set of targetvalues. RNNs represent a speciﬁc type of NNs adapted forprocessing sequential input data [14]. However, the standardimplementation of RNN has difﬁculties in capturing long-termdependencies of events that are signiﬁcantly separated in time.In an LSTM network, such information can be propagatedthrough time within an internal state memory cell, making thenetwork capable of memorizing features of signiﬁcance [15].A typical LSTM-block is illustrated in Fig 1. The statememory cell, illustrated by the light grey area, is controlledby nonlinear gating units that regulate the ﬂow in and out ofthe cell [13]. Following [15] and [13], the forward operationof an LSTM block is summarized below. It should be notedthat each block consists of a number of hidden LSTM cells.Vector notation is used, meaning that, for instance, the hiddenstate vector h t is not the output of a single LSTM-cell at time t , but the output of a vector of N LSTM-cells. The operationof an LSTM block at a time t may then be summarized by: f t = σ (cid:0) W f x t + U f h t − + b f (cid:1) (1) i t = σ (cid:0) W i x t + U i h t − + b i (cid:1) (2) ˜ c t = tanh (cid:0) W c x t + U c h t − + b c (cid:1) (3) c t = f t (cid:12) c t − + i t (cid:12) ˜ c t (4) o t = σ (cid:0) W o x t + U o h t − + b o (cid:1) (5) h t = o t (cid:12) tanh( c t ) , (6)where element-wise multiplication is denoted by (cid:12) , σ isthe logistic sigmoid function, tanh is the hyperbolic tangentfunction, and with the following variables: • x t ∈ R M : input vector to an LSTM block • h t , h t − ∈ R N : output vector at time t respectively t - • f t ∈ R N : activation vector of the forget gate • i t ∈ R N : activation vector of the input gate • ˜ c t ∈ R N : vector of the the candidate gate • c t ∈ R N : cell state memory vector • i t ∈ R N : activation vector of the output gate forget gate input gate candidate gate output gatetanh c t-1 h t-1 cell state h t f t i t o t c t c t~ x t h t c t Fig. 1. Detailed schematics of an LSTM block where W , U , and b represents the weight matrices and biasvectors for each gate. The superscripts M and N refer to thenumber of inputs and hidden LSTM cells in each LSTM block,respectively.By the operation of (1), the forget gate controls whatinformation should be stored from the previous memory cellstate, and what can be discarded as irrelevant. The input gateand candidate gate control and update the memory cell statewith new information by (2) – (4). Equations (5) – (6) showshow the hidden state is updated by the operation of the outputgate, modulated by the updated cell state memory vector.An LSTM network may then be constructed by creating asequences of several LSTM blocks. A partition of an LSTMsequence is illustrated in Fig. 2, where each block has adirected connection to the following block in the sequence. Ifthe block is the ﬁrst one in the sequence, the past system stateis initialized with a preset value. For a deep LSTM network,with several stacked layers, the inputs to the deeper layersconsists of the hidden states of LSTM blocks of previouslayers. The cell state memory is only passed along the timesequence between LSTM blocks of the same layer. Typically,for classiﬁcation purposes, an output vector y is generated byapplying a nonlinear function of the hidden state implementedby a separate feedforward NN. Depending on the applicationof the network, output vectors may be computed for a single,or for several, LSTM block’s hidden states. x t-1 LSTM h t-1 c t-1 LSTM x t h t h t-1 c t h t x t+1 LSTM h t+1 c t+1 h t+1 Fig. 2. An LSTM sequence with a directed connection between the blocks

The LSTM network can then be trained using a supervisedapproach, where a set of training sequences and an optimiza-tion algorithm are used to update and learn suitable values forthe weights matrices and bias vector parameters.III. M

ETHODOLOGY

The proposed method for real-time VIP is based on off-linetraining of an LSTM network on a large data set consisting oftime-domain simulation responses following a set of crediblecontingencies. The method is aimed to be used as a supple-mentary warning system that can assess the current state ofthe system in real time. The LSTM network takes real-timeand historic measurements and attempts to assess whether the current state will cause voltage stability issues several minutesinto the future. As time progresses and if new events occur inthe system, the network updates the assessment continuously.The network is also adapted to be able to indicate where in the g194071 4072g20 g940114012g10 4022g510221021g4

Eq. g8 2032 2031 4031 g121011 101310144021403240414061 40634062 40514045 404740464043 404240441043 10441041 1045 1042g1 1012 g2g3g11g14 g15g6g16(ab)g18(ab)g17 g13 g7c

NorthC1 C2 C3South

Fig. 3. One-line diagram of Nordic32 system with subareas system instability emerges, following the approach developedin [11], allowing more cost-effective countermeasures.The ﬁrst step of the method is the off-line generation ofcredible operating conditions (OCs) and contingency scenariosusing time-domain simulations. The method is tested on theNordic32 test system with all data and models as presented in[16]. After a representative training set is generated, training ofthe LSTM network is performed. Each step in the methodologyis described in the following subsections.

A. Generation of training data

The generation of a training set is a critical step, and arange of different initial OCs and contingencies were includedto generate a representative training set. Dynamic simulationswere performed using PSS®E 34.2.0 with its built-in models[17]. The steps of generating the training data are illustratedas a ﬂowchart in Fig. 4 and can be summarized as follows:

1) Initial OCs:

For the Nordic32 system, the initial OCswere randomly generated around the stable operation pointdenoted as "operating point B" in [16]. A large number ofpossible OCs were simulated by randomly initiating the loadsfrom a uniform distribution around the base case load levels(80 % of original load as lower limit and 120 % as upperlimit), while the power factor of the loads was kept constant.The total load change was distributed among the generatorsbased on a weighted random distribution, where a higher ratedcapacity of a generator results in a higher probability to cover alarger share of the total load change. All generation that couldnot be supplied by the regular generators were distributed tothe slack bus generator g20, see Fig. 3.In real applications, more delicate methods for efﬁcientdatabase generation and more careful generation of relevantOCs should be used [3], [18], where for instance the impact ofunit commitment and topology changes are taken into account.

2) Solve and check for feasibility:

The generated OCs weresolved with a power ﬂow simulator, which served as a startingpoint for the dynamical simulation. If the system load ﬂow didnot converge, the initial OC was re-initialized.

3) Start dynamic simulation and introduce contingencies:

Two separate dynamic simulations were then initiated for

Randomly initialize OCSolve load flowFeasible?Run dynamic simulationuntil convergence or collapseYes NoStart dynamic simulation andrun for a period to sample N-0 dataAt every time step (t),sample input data ( x t )from measurements(V, δ V ,P,Q)N-1 Wait for certainrandomized time periodClassify each event and generate target valuevectors for whole simulation sequenceN-1-1Apply a second randomcontingencyReiterate Apply a random contingencyApply a random contingency

Fig. 4. Flowchart for generating input data and target values the N - and the N - - cases. The process is illustrated inFig. 5. For each of the two cases, the system runs without anycontingencies for seconds to generate sufﬁcient amount of N - data for the network to train on. At t = 66 seconds,the same ﬁrst contingency was applied to both of the cases.After an additional uniformly distributed random time periodsin [10 − seconds after the ﬁrst contingency, a secondaryconsecutive contingency was applied for the N - - cases.Events resulting in several (near-)simultaneous contingencieswere not taken into account ( N - k events).The considered contingencies in the simulations were either(i) tripping of a generator, or (ii) a three-phased fault during . seconds, followed by tripping the faulted line, which wasthen kept tripped during the remaining time of the simulation.The ﬁrst contingency was chosen to be a major fault, meaninga fault on any transmission line connecting the different mainareas in the system (excluding the "Eq." area, see Fig. 3), or any larger thermal generator in the "Central" area. The secondcontingency, for the N - - cases, included tripping of any transmission line in the whole system, excluding lines in the"Eq." area. No variations of load and generation were takeninto account during the dynamic simulations as these, in therelatively short time period of the simulation, are presumed tohave a small impact on the system stability.

4) Sample inputs and run until stopping criteria:

For eachof the two cases, an input vector x t consisting of measure-ments of all bus voltage magnitudes and angles, active andreactive power ﬂows, were sampled every second and savedin a data ﬁle. No information regarding the type and locationof applied the contingencies was sampled, as this informationimplicitly can be learned by the LSTM network. For instance,the LSTM network should be able to correlate a zero powerﬂow in a transmission line with that line being out of service.Each dynamic simulation ran for a total of seconds,but was, in the case of a major voltage collapse, stopped inadvance. The simulation interval of seconds was chosen toallow time for all dynamic events to occur and for the systemto either stabilize or collapse.

5) Classiﬁcation:

For each case, a sequence of true targetvalue vectors y , ..., y was generated for every time stepin the time-domain simulation. Each y t in these sequences Stable

StableAlertEmergency

Stable Alert C1 EmergencyN-1-1N-1N-0 N-1 Alert C1 t (s)

N-0

StableAlertEmergencyTruetarget valueTruetarget value

Fig. 5. Example of classiﬁcation of an N - and an N - - case represents the classiﬁcation of the system if the system isallowed to run from time t up until seconds withoutany changes to the current system. As time progresses andnew events occur, the class of y t may change. The sequencesconsists of multidimensional vectors where the actual class isencoded using one-hot (binary) encoding.The classiﬁcation was performed according both to theseverity and the location of the system degradation at the end of the time-domain simulation. The system was deﬁned asstable if all transmission bus voltage magnitudes were aboveor equal to pu, in an alert state if any transmission busvoltage magnitude ranged between . < V < . pu, and inan emergency state if any transmission bus voltage magnitudewas below . pu. Overvoltages were not taken into account.The target values for the alert cases were also classiﬁedaccording to where the lowest bus voltage magnitudes werefound at the end of each dynamic simulation. The Nordic32test system was therefore divided into different regions, asillustrated in Fig. 3. The regions "North", "South", and "Eq."were found to be stable regions, and no alert events werefound in these regions for any of the simulated cases. Thus,for the classiﬁcation of the alert cases, only the other threeregions (indicated by C1 , C2 , C3 in Fig. 3) were used. Theclassiﬁcation for each time step of each simulation belongedthen to one of 5 different possibilities. Either, the wholesystem was predicted stable; it ended up in an emergencystate; or an alert state was predicted in one of the three deﬁnedregions where the lowest occurring transmission bus voltagewas found.The classiﬁcation process is illustrated in Fig. 5. The targetvalues are always classiﬁed as stable up until the ﬁrst contin-gency. From different combinations of OCs and contingencies,the system may then end up being in a stable state, an alertstate in area C1 , C2 , or C3 , or in an emergency state. Forthe N - case, the sequence of true target value vectors fromthe time of the contingency to the end of the simulation areclassiﬁed depending on which of these ﬁve states the systemsends up in. For the example of the N - case in Fig. 5, thesystem ends up in an alert state in the C1 area. For the N - - case, the target values are classiﬁed as stable up until the ﬁrstcontingency. The target values are then gathered from the N - x t-59 y t ŷ t x t LSTM LSTM LSTM LSTMLSTM LSTM LSTM LSTMLSTM LSTM LSTM LSTMSoftmax

Input vectorsLSTM layer 1LSTM layer 2LSTM layer 3FC layer x t-58 x t-57 True target values ( y ) Predictions ( ŷ ) Fig. 6. The proposed LSTM network architecture case, using the end state of that simulation for classifying thestate between the ﬁrst and the consecutive contingency. Afterthe second consecutive contingency, the system runs until iteither collapses or until seconds. Depending on this ﬁnalstate, the sequence of true target value vectors from the secondcontingency until the end of the simulation are classiﬁed. Inthe example in Fig. 5, an emergency state is reached. Notethat the scales in the Fig. 5 are different from those in thesimulations for easier interpretation. In real-life applications,more intricate stability limits could be used to allow a moredetailed classiﬁcation.

6) Reiteration:

The described steps are reiterated until asufﬁciently large training set is generated.

B. Architecture of the LSTM network

The proposed LSTM network architecture, shown in Fig. 6,is generally referred to as a "many-to-one" architecture, whereprevious measurements in the time sequence are used for theclassiﬁcation in the ﬁnal block. The network consists of threestacked LSTM layers which are used to capture different levelsof features from the inputs. Each LSTM block consists of32 individual LSTM cells. The ﬁrst layer of LSTM-blockstakes a generated sequence of input vectors as inputs; then bymathematical operation as presented in Section II, the outputof each block is forwarded both to the following block in thesequence, as well as to the upper layer of LSTM-blocks. Thethird layer of LSTM-blocks only passes the output forwardalong the time sequence. The output layer at time t is a fullyconnected network with softmax activation for classiﬁcation.In training, the network use the true target vector y t at time t ,while during the test or prediction phase, the network estimatesa prediction vector ˆ y t at time t . The interpretation of theprediction problem is further explained in section III-D. C. Training the LSTM network

Different data sets were used for training, validation, andtesting of the method on a mix of N - and N - - cases. Thetraining data set has the dimension (135 , × × ,where the dimension represents the number of training cases,the number of inputs, and the total interval in seconds for eachsimulation, respectively.Before training, a process generally referred to as sequencepreprocessing was performed to prepare batches of sequenceswith suitable length. The network is designed to take asequence of time steps of measurements as inputs andsubsequences with a length of time steps (cid:0) x t − , ..., x t (cid:1) TABLE ID

ESIGN AND HYPERPARAMETERS USED IN TRAINING

Parameter Values and size D a t a Simulation interval 560Feature dimension 364Target classes 5Training cases ( N - + N - - ) 45,000 + 90,000Validation cases ( N - / N - - ) 5,000 / 10,000Test cases ( N - / N - - ) 10,000 / 10,000 A r c h it ec t u r e LSTM layers 3LSTM sequence length 60FC activation function SoftmaxLSTM hidden cells 32LSTM Activation function Tanh T r a i n i ng Max Epochs 400Learning rate ( α ) 0.0001Dropout & recurrent dropout 50 % / 50 %Optimizer Adam [19]Loss metric Categorical cross-entropy were thus extracted from the seconds long simulationintervals, for different values of t . For each subsequenceof input vectors, a corresponding target value ( y t ) at time t was gathered. The sequence preprocessing was performed times for each training and validation case by varying t between values of t = [60 , . The lower bound of t isrequired to always allow historic data to be included into thesequence. The LSTM network could have been trained on thewhole simulation interval by increasing upper bound of t from to . However, since the method is proposed to be usedin fast VIP applications, there is less usefulness of predictinginstability long after the contingencies have occurred.The generated subsequences were then used to train theLSTM network. Due to memory limitations, a method calledmini-batch gradient descent was utilized where mini-batchesof subsequences were used separately to train the net-work. The training was performed for a maximum of 400epochs. An epoch is ﬁnished when all generated batches havebeen used to update the network parameters. Adam [19], anadaptable algorithm suitable for gradient-based optimization ofstochastic objective functions was used in training the network.The algorithm used default parameters according to [19],except for the learning rate which was tuned. The loss functionwhich the optimizer is applied on is the categorical cross-entropy function, which is suitable for multi-classiﬁcationproblems. To avoid overﬁtting the data, two regularizationtechniques were used during the training. First, early stop-ping was implemented, and the training of the network wasstopped in case the performance on the validation set did notimprove after six epochs. Second, a technique called dropoutwas applied, where a certain percentage of the connectionsbetween inputs and the LSTM cells were randomly masked(or "dropped") with the aim of reducing overﬁtting on thedata. Both conventional dropout and recurrent dropout betweenconsecutive blocks were applied during the training phase.All other parameters related to the training of the networkare presented in Table I. The architecture and parameters usedto train the network have been iteratively tuned to increase theclassiﬁcation accuracy. However, the tuning could be extendedeven further to allow an even better classiﬁcation accuracy. D. Interpretation and intuition of the VIP problem

By the proposed training and architecture of the LSTMnetwork, a classiﬁcation problem is solved where the current system state space is separated into different regions. Everystate on a trajectory to a stable, alert (in C1 , C2 , or C3 ), oremergency state is labelled accordingly. The LSTM networkis then trained on this data to implicitly learn these asymptoticproperties of solutions and the trajectories of the system state.Once trained, the network can correlate the inputs, currentand historic measurements, with a certain state space regionand trajectory, allowing instant warnings of voltage instabilityonly moments after a contingency have occurred in a system.The classiﬁcation is performed under the assumption thatthe current system is unchanged, meaning that no additionalcontingencies or changes in generation and load conﬁgurationwill occur. However, as time progresses, new observations areused as inputs to the LSTM network to continuously updateand incorporate such changes in the system.This VIP problem should be interpreted as a ﬁxed horizonprediction problem, where the prediction horizon always isthe ﬁnal state given by the trajectories of the (dynamical)system. This interpretation assumes that the simulation horizonof the generated time-domain simulations are sufﬁciently longso that extending the simulation horizon even further, for thisparticular system beyond 560 seconds, would not change thepartitioning of the state space.IV. R ESULTS AND DISCUSSION

A. Test results

The developed VIP methodology was tested on two separatetest sets, one containing only N - cases, the other containing N - - cases. Each test set was composed of , cases ofdynamic simulations. The test results of the predictions arepresented using categorical accuracy, where the indices of thetrue target values are compared to the argument maxima of thepredictions. The accuracy at each time step is then calculatedover time for each of the two test sets.The data were fed into the network in the form of arolling window, with subsequences generated in the exact samemanner as described in Section III-C. As time t progresses,new measurements entered the network from the rightmostblock in the input layer and were shifted to the left in each timeincrement. Since the LSTM network require a sequence of 60time steps of data, no predictions were made before t = 60 .To facilitate the presentation in the following ﬁgures, a newtime index T is introduced here. The relationship between thetwo time indices is T = t − . The classiﬁcation accuracy isonly plotted for 120 seconds to better visualize the changes inaccuracy after the contingencies.The classiﬁcation accuracy over time is presented in Fig. 7.The classiﬁcation accuracy for the N - test set droppedsigniﬁcantly at T = 6 seconds, which is the same instant thatthe ﬁrst contingency is applied. The large drop in classiﬁcationaccuracy can be attributed to low bus voltages instantaneouslyfollowing the ﬁrst contingency, which the LSTM networkhas learned to correlate to a voltage instability event. After TABLE IIC

ONFUSION TABLE SHOWING PREDICTION RESULTS AND ACCURACY OF THE

LSTM

NETWORK EVALUATED AT T = 50 SECONDS

Predicted states ( N - / N - - ) Stable state Alert state Emergency state AccuracyClassiﬁcation

All areas C1 C2 C3 All areas A c t ua l s t a t e s Stable state

All areas 2766 / 1171 0 / 20 0 / 19 0 / 0 0 / 0 100 / 96.8 %

Alert state

C1 0 / 0 856 / 565 0 / 5 0 / 0 0 / 0 100 / 99.1 %C2 0 / 5 0 / 8 1874 / 1237 0 / 1 0 / 90 100 / 92.2 %C3 0 / 0 0 / 0 0 / 42 0 / 178 0 / 0 - / 89.9 %

Emergency state

All areas 0 / 0 0 / 0 0 / 34 0 / 0 4504 / 6625 100 / 99.5 %

Accuracy

100 / 99.6 % 100 / 95.3 % 100 / 92.5 % - / 99.4 % 100 / 98.7 %

100 / 97.7 %

Fig. 7. Classiﬁcation accuracy over time for the proposed LSTM network the ﬁrst contingency, the classiﬁcation accuracy increased andremained constant at 100 % for the rest of the simulations.The classiﬁcation accuracy for the N - - test set wasidentical up until the consecutive contingencies were randomlyapplied. During this time span, illustrated by the arrows inFig. 7, the classiﬁcation accuracy decreased slightly. Sincethese contingencies do not occur at the same time instant ineach test case, the same large drop in accuracy as for the N - cases was not seen. The accuracy then gradually increased andstabilized at around -

98 % .The results show that the LSTM network can classify andpredict future stability almost perfectly for the N-1 contin-gency cases and with good accuracy for the N-1-1 cases.To examine which cases were misclassiﬁed, the predictionaccuracy for the two test sets, evaluated at T = 50 seconds,are presented in Table II in the form of a confusion table.Each number in the column in the table represents instancesof the predicted classes and each number in the row representsthe instances of the actual classes. The (empirical) conditionalprobabilities of correctly classifying a certain state is presentedin the column furthest to the right. Similarly, the conditionalprobability of a state actually belonging to the predictedstate is presented in the bottom row of the table. The totalaccuracy is presented in the lower right corner of the table. Theaccuracy for all N - cases is 100 % and no cases are falselyclassiﬁed. For the N - - test set, the lowest classiﬁcationaccuracy occurred for the alert states. After inspection of thefalsely classiﬁed cases, it was found that several of these wereborderline cases where the transmission bus voltage magnitudeused in the classiﬁcation were very close to what was used inthe other classes. The highest classiﬁcation accuracy occurredfor the emergency cases with . .It should be noted that the test and training sets were weighted with more cases ending up in certain classes thanothers. It is thus probable that the results are slightly biasedwith higher accuracy for these classes, and that the classiﬁca-tion accuracy of the other classes may be lower as an effect. B. Impact of sequence length

In this section, the performance of the sequence-basedapproach is tested and compared against a conventional feed-forward NN, which only uses a single snapshot of mea-surements as inputs. Further, to test the impact of a shorter time sequence, the results of an LSTM network using a timesequence of time steps, instead of , are presented.To allow a fair comparison between the two approaches,the feedforward NN used in this comparison was designed tobe as similar as possible to the LSTM network. Essentially,the design of the NN in the comparison is identical to the ﬁnal time step in the LSTM network presented in Fig. 6, withthe difference that each layer consists of a hidden layer ofneurons. The designed NN thus has three hidden unit layers,each layer with 32 hidden nodes. The same FC layer witha softmax activation function was used. The training for theNN was performed identically as for the LSTM network, withthe exception that instead of a sequence of input values, asingle snapshot was used. The LSTM network using a shortertime sequence was trained identically to that of the longerLSTM network with the exception that a shorter sequence of instead of time steps was used.In Fig. 8, the classiﬁcation accuracy on the N - - test setis presented for the two LSTM networks with the differenttime sequence length and for the conventional NN. Theclassiﬁcation accuracy for the conventional NN was around

93 % after all the consecutive contingencies were been applied,while that of the proposed LSTM network is around -

98 % .The results clearly indicate that the performance of the LSTMnetwork using time steps in the sequence signiﬁcantlyexceeded that of the conventional NN, generally providingbetter classiﬁcation accuracy over the whole time frame ofthe simulation cases.The classiﬁcation accuracy of the LSTM network usinga shorter sequence was similar to the one using a longersequence, with the difference of a large drop in classiﬁcationaccuracy at around T = seconds, see Fig. 8. The same de-cline in classiﬁcation accuracy, though less signiﬁcant, can benoted for the LSTM network using the longer time sequence.For the LSTM network with the time steps long sequence, Fig. 8. Impact of sequence length on classiﬁcation accuracy the accuracy drop started at T = 76 seconds, declined for seconds, and was then restored to around

97 % accuracy. Forthe LSTM network using the time steps long sequence, thedecline started time steps earlier, at T = seconds. Onceagain, the classiﬁcation accuracy decreased for seconds,and was then restored.One explanation of these results is that the LSTM net-works utilize information concerning the contingency and pre-contingency state to enhance the classiﬁcation accuracy. Itshould be noted that the decline in classiﬁcation accuracystarted exactly respectively seconds after the consecutivecontingencies are introduced (at T = 16 ) and that the durationfor the decline in classiﬁcation accuracy corresponded to theexact time frame that the consecutive contingencies wereintroduced. Thus, when the networks lose the informationabout the pre-contingency state, the chance of a misclassiﬁca-tion increases. These results strengthen the hypothesis a longsequence LSTM network could be used to enhance the statesignal to provide better classiﬁcation accuracy. Theoretically,an even longer sequence could be used to increase the accuracyeven further. However, this would increase the computationalcost of training, and a balance between classiﬁcation accuracyand computational cost should be sought. C. Generalization capability and training set requirement

The generalization capability of a ML method refers to thecapability to generalize the learning from the actual training setto other, yet unseen cases. Such capability is especially valu-able in overcoming the combinatorial increase of complexityin the training when N - - cases are also considered [20].In Fig. 9, the classiﬁcation accuracy is presented on the N - - test set when the LSTM network have been trained onthree different training sets. The results are presented whenthe network were trained on i) the full training set with all N - and N - - cases included, ii) a smaller training set withall N - cases but where only a small batch ( , ) of N - - cases have been included, and iii) a training set where thenetwork is only trained on N - . The same training approachas previously described were used. According to Fig. 9, theclassiﬁcation accuracy was signiﬁcantly reduced when no N - - cases are included in the test set. When including thesmall batch ( , ) of N - - cases, the classiﬁcation accuracyincreased signiﬁcantly. However, the accuracy is still lowerthan when the full training set is used. Thus, the importanceof obtaining a representative training set is still imperative ifa high classiﬁcation accuracy is to be achieved. Fig. 9. Classiﬁcation accuracy over time when varying the number of N - - cases included in the training data D. Practical applications and requirements

The method is proposed to be used as an online tool forsystem operators to monitor the current state of a powersystem. It should be stressed that the method is not proposed toreplace conventional voltage instability detection methods, butrather function as a supplementary tool to provide early warn-ings. The instantaneous prediction capability of the proposedmethod has to be weighed against the possibility of misclassi-ﬁcation of the system’s future stability. When comparing theproposed method to other conventional indicators for voltageinstability detection (see [2]), it is important to remember thatthese might be more accurate once instability detected, butgenerally take signiﬁcantly longer time to indicate instability,thus reducing the time frame that system operators have tosteer the system back into stable operation.For the proposed method to be effective, measurementupdates should be available within a few seconds. In thispaper, a measurement update rate each second is assumedto be available. To assure that errors and missing values areﬁltered out, measurements should always be preceded by astate estimator. However, state estimates from a non-linearstate estimator based on remote terminal units may be tooslow to be effective. Thus, time-synchronized measurementsfrom wide-area phasor measurements ﬁltered through a linearstate estimator would be preferred.The softmax classiﬁer of the LSTM network outputs a prob-ability vector, where each class is given a certain probability. Itshould be noted that this probability vector does not provide a true representation of the model conﬁdence. However, it canstill be useful as a proxy by system operators to track thenetwork’s conﬁdence in each prediction. Thus, the operatorcan use the probability vector directly in an online interfaceto track the network’s belief in each prediction. Alternatively,argument maxima or other functions could be used to presentthe most probable prediction of the network, or, for instance,to avoid predictions of falsely labelled stable states.The practical classiﬁcation accuracy of the proposedmethod will be affected by many aspects and will generallybe lower than on a simulated test set. One of the moreimportant aspects are modeling errors, including erroneoussystem parameters or inaccurate modeling of parameter valuesfor dynamic models. Such aspects will introduce a differencebetween the simulated and the actual dynamic response after acontingency. However, it should be noted that such limitationsare not limited only to ML based approaches for VIP. All methods for DSA require that the dynamic models used inassessing the system response are accurately modeled.V. C

ONCLUSIONS

This paper presents a new approach for online voltage insta-bility prediction using an LSTM network capable of utilizing asequence of measurements to improve classiﬁcation accuracy.Once trained, the LSTM network can allow system operatorsto continuously assess and predict whether the present systemstate is stable, or will evolve into an alert or an emergency statein the near future. The network is also adapted to be able toindicate where instability emerges, allowing system operatorsto perform more cost-effective control measures.The LSTM network was proposed with the aim of improv-ing the available state signal by implicitly learning the long-term dependencies of voltage instability events. The resultspresented in the paper are highly encouraging and the proposedmethod is shown to have high accuracy in predicting voltageinstability. The impact of sequence length of the LSTMnetwork was tested and showed that a longer sequence pro-vided a signiﬁcantly better classiﬁcation capability than botha feedforward NN and a network using a shorter sequence.The paper also examined the generalization capability of theproposed LSTM network, where the classiﬁcation accuracy on N - - cases was assessed when the system was only trainedon N - cases. It was found that this reduced the classiﬁcationaccuracy signiﬁcantly, whereas including a smaller subset of N - - cases into the training set resulted in signiﬁcantly betterperformance. R EFERENCES[1] P. Kundur et al. , “Deﬁnition and classiﬁcation of power system stabilityIEEE/CIGRE joint task force on stability terms and deﬁnitions,”

IEEETrans. Power Syst. , vol. 19, no. 3, pp. 1387–1401, Aug 2004.[2] M. Glavic and T. Van Cutsem, “A short survey of methods for voltageinstability detection,” in

Proc. (IEEE) PES General Meeting , Detroit,MI, Jul 2011, pp. 1–8.[3] I. Konstantelos et al. , “Implementation of a massively parallel dynamicsecurity assessment platform for large-scale grids,”

IEEE Trans. SmartGrid , vol. 8, no. 3, pp. 1417–1426, May 2017.[4] T. Van Cutsem et al. , “Decision tree approaches to voltage securityassessment,”

IEE Proceedings C - Generation, Transmission and Dis-tribution , vol. 140, no. 3, pp. 189–198, May 1993.[5] Y. Mansour et al. , “Large scale dynamic security screening and rankingusing neural networks,”

IEEE Trans. Power Syst. , vol. 12, no. 2, pp.954–960, May 1997.[6] K. Sun et al. , “An online dynamic security assessment scheme usingphasor measurements and decision trees,”

IEEE Trans. Power Syst. ,vol. 22, no. 4, pp. 1935–1943, Nov 2007.[7] H. Khoshkhoo and S. M. Shahrtash, “Fast online dynamic voltage in-stability prediction and voltage stability classiﬁcation,”

IET Generation,Transmission & Distribution , vol. 8, no. 5, pp. 957–965, May 2014.[8] C. Liu, F. Tang, and C. L. Bak, “An accurate online dynamic securityassessment scheme based on random forest,”

Energies , vol. 11, no. 7,2018.[9] R. Diao et al. , “Decision tree-based online voltage security assessmentusing PMU measurements,”

IEEE Trans. on Power Syst. , vol. 24, no. 2,pp. 832–839, May 2009.[10] H. Khoshkhoo and S. M. Shahrtash, “On-line dynamic voltage instabilityprediction based on decision tree supported by a wide-area measurementsystem,”

IET Generation, Transmission & Distribution , vol. 6, no. 11,pp. 1143–1152, November 2012.[11] H. Hagmar et al. , “On-line voltage instability prediction using anartiﬁcial neural network,” in , June 2019(Accepted), pp. 1–6. [12] R. S. Sutton and A. G. Barto,

Reinforcement Learning: An Introduction .Cambridge, Massachusetts London, England: The MIT Press, 2015.[13] K. Greff et al. , “LSTM: A Search Space Odyssey,”

IEEE Trans. NeuralNetw. & Learning Syst. , vol. 28, no. 10, pp. 2222–2232, Oct 2017.[14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning represen-tations by back-propagating errors,”

Nature , vol. 323, no. 6088, p. 533,1986.[15] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”

Neuralcomputation , vol. 9, pp. 1735–80, 12 1997.[16] T. Van Cutsem et al. , “Test systems for voltage stability analysis andsecurity assessment,” IEEE/PES Task Force, Tech. Rep. PES-TR19,Aug. 2015. [Online]. Available: http://resourcecenter.ieee-pes.org/pes/product/technical-publications/PESTR19[17]

PSS®E 34.2.0 Model Library , Siemens Power Technologies Interna-tional, Schenectady, NY, Apr. 2017.[18] F. Thams et al. , “Efﬁcient database generation for data-driven securityassessment of power systems,”

IEEE Trans. Power Syst. , pp. 1–1, 2019.[19] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv e-prints , p. arXiv:1412.6980, Dec 2014.[20] P. Mitra et al. , “A systematic approach to n - - analysis for power sys-tem security assessment,” IEEE Power and Energy Technology SystemsJournal , vol. 3, no. 2, pp. 71–80, June 2016.

Hannes Hagmar (S’17) received the M.Sc. degree in electric power engineer-ing from Chalmers University of Technology, Gothenburg, Sweden in 2016.Between 2016 to 2017, he worked at RISE Research Institutes of Sweden withresearch in electric transmission systems and measurement technology. He iscurrently pursuing the PhD degree at Chalmers University of Technology. Hisresearch interest includes power system dynamics and stability, integration ofrenewables, and machine learning.

Lang Tong

Lang Tong (S’87,M’91,SM’01,F’05) is the Irwin and Joan JacobsProfessor of Engineering at Cornell University and the Cornell site Directorof Power Systems Engineering Research Center (PSERC). He received theB.E. degree from Tsinghua University and the Ph.D. degrees in electricalengineering from the University of Notre Dame. He was a PostdoctoralResearch Afﬁliate at the Information Systems Laboratory, Stanford Universityheld visiting positions at Stanford University, the University of California atBerkeley, the Delft University of Technology, and the Chalmers University ofTechnology in Sweden. Lang Tong’s current research focuses on optimization,machine learning, AI, and economic problems in energy and power systems.He received several IEEE society transaction prize papers and conference bestpaper awards. He was a Distinguished Lecturer of the IEEE Signal ProcessingSociety and the 2018 Fulbright Distinguished Chair in Alternative Energy.

Robert Eriksson (SM’16) received the M.Sc. and Ph.D. degrees in electricalengineering from the KTH Royal Institute of Technology, Stockholm, Sweden,in 2005 and 2011, respectively. He held an associate professor position atthe Center for Electric Power and Energy, DTU Technical University ofDenmark, from 2013 to 2015. He is currently with the Swedish NationalGrid, Department of Markets and System Development. His current researchinterests include power system dynamics and stability, automatic control,HVDC systems, and DC grids.