DeVLearn: A Deep Visual Learning Framework for Localizing Temporary Faults in Power Systems
DDeVLearn: A Deep Visual Learning Framework forLocalizing Temporary Faults in Power Systems
Shuchismita Biswas, Rounak Meyur and Virgilio CentenoDepartment of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA { suchi, rounakm8, virgilio } @vt.edu Abstract —Frequently recurring transient faults in a trans-mission network may be indicative of impending permanentfailures. Hence, determining their location is a critical task. Thispaper proposes a novel image embedding aided deep learningframework called DeVLearn for faulted line location using PMUmeasurements at generator buses. Inspired by breakthroughsin computer vision, DeVLearn represents measurements (one-dimensional time series data) as two-dimensional unthresholdedRecurrent Plot (RP) images. These RP images preserve thetemporal relationships present in the original time series andare used to train a deep Variational Auto-Encoder (VAE). TheVAE learns the distribution of latent features in the images.Our results show that for faults on two different lines inthe IEEE 68-bus network, DeVLearn is able to project PMUmeasurements into a two-dimensional space such that data forfaults at different locations separate into well-defined clusters.This compressed representation may then be used with off-the-shelf classifiers for determining fault location. The efficacy ofthe proposed framework is demonstrated using local voltagemagnitude measurements at two generator buses.
Index Terms —fault localization, deep learning, dimensionalityreduction, image embedding, variational autoencoders, recur-rence plots, CNN
I. I
NTRODUCTION
In the power transmission network, temporary faults may becaused by momentary line contact with vegetation or animals.Resultant faults may have high fault impedance and are clearedwithout the action of any protective element, making thelocalization task especially challenging. Frequently recurringdisturbances in close proximity to each other might indicatethe presence of system vulnerabilities, which may result incatastrophic failures in the future. Therefore, localizing dis-turbances and rectifying them in time is a critical requirementfor safe and reliable grid operations. Traditional methods fordisturbance localization has included impedance measurementand travelling wave based approaches which are sensitive toline parameters or have high sampling requirements [1]. Thelarge-scale deployment of Phasor Measurement Units (PMU)in recent years motivates exploring data-driven approaches forlocalizing transient faults [2]–[4]. In [3], the authors proposelocating load increase events using logistic regression. Aneural network based approach is also proposed in [4] forcomplementing conventional methods.In recent years, deep learning (DL) has been gaining ac-ceptance and popularity in the power systems community dueto its capability of learning highly non-linear relationshipsamong variables. In recent literature, DL has been used forapplications like event classification [5], dynamic security Fig. 1: The DeVLearn frameworkassessment [6], and load forecasting [7]. This paper proposesa De ep V isual Learn ing framework (DeVLearn) for the tran-sient fault localization task, particularly identifying the faultedline. Once the line is identified, analytical methods, like theone proposed in [2] may be used to pinpoint the exact faultlocation. Our primary objective is to compute a compressedrepresentation of measurement data in a lower dimensionalspace by training a deep Variational Auto-Encoder (VAE) [8].In DeVLearn (Fig. 1), this is done by first embedding a timeseries of length n into a n × n image using unthresholdedRecurrence Plots (RP) [9]. The image is then compressedto a single point in a lower order k -dimensional space, alsocalled ‘latent space’. The image embedding step enables directapplication of DL models from the computer vision domain.Performance of DeVLearn is demonstrated using measure-ments from two generator buses of the IEEE 68 bus network.Temporary three-phase faults with different clearing times andfault impedances are simulated on two different lines. Here,we explicitly choose only generator buses as they are morelikely to be instrumented with PMUs in reality. As DeVLearnis trained, machine responses to different faults separate intowell-defined clusters in latent space.The main contributions of the paper are summarized asfollows. 1) We show how the unthresholded RP method maybe used to represent univariate time series as images thatpreserve the temporal relationship in the original time series.2) We introduce a DL framework to compute a compressedrepresentation of RP images in low dimensional latent space.We show that the deep VAE is able to separate fault measure-ments into discernible clusters in this latent space, and henceoff-the-shelf classifiers may be used to localize the faults.This dimensionality reduction or feature learning techniqueis a novel contribution in the power system domain and hasimmense potential even beyond the fault localization task.The remainder of the paper is organized as follows. Insection II we explain the DeVLearn framework and its compo-nents in detail. Section III describes our experimental results.Section IV discusses limitations of the present work, outlinesfuture research directions and concludes the paper. a r X i v : . [ ee ss . S Y ] N ov a) Time series data (b) Two dimensionsal phase space trajectory (c) Unthresholded Recurrence plot Fig. 2: Procedure for constructing unthresholded RP images from time-series data, reproduced from [10]. On the left panel, weshow a simple univariate time series f ( t ) with 12 samples. The middle panel shows its two dimensional phase space trajectorywith delay embedding 1. The dots are system states such that s i : ( f ( i ) , f ( i + 1)) . The right panel shows the unthresholdedRP for f ( t ) . It is a × matrix, whose ( i, j ) -th entry is the euclidean distance between s i and s j in the phase space.II. M ETHODOLOGY
Recent years have seen DL achieve major breakthroughs inthe fields of computer vision and speech recognition [11], [12].DL approaches for time series analysis, however, has beencomparatively limited so far. Some deep generative modelshave been proposed for learning underlying structures in one-dimensional time series data [13], but their performance isheavily dependent on hyperparameter tuning. We propose toleverage image processing advancements in the power domainby first converting measurements to RP images and thentraining a DL model to recognize latent structures in them.The efficiency of using RP-based learning for time seriesclassification (TSC) has been demonstrated in [10]. Here,the authors show that RP-embedding is more efficient forTSC than other benchmark methods as well as other imageembedding methods proposed in literature [14].
A. Recurrent Plots
Time series data are characterized by distinct behavior likeperiodicity, trends and cyclicities. Dynamic nonlinear systemsexhibit recurrence of states which may be visualized throughRPs. First introduced in [9], RPs explore the m -dimensionalphase space trajectory of a system by representing its recur-rences in two dimensions. They capture how frequently a sys-tem returns to or deviates from its past states. Mathematically,this may be expressed as below. R i,j = || (cid:126)s i − (cid:126)s j || , i, j = 1 , , . . . K (1)Here, (cid:126)s i and (cid:126)s j represent the system states at time instants i and j respectively. K is the number of system statesconsidered. In the original RP method, the R matrix is binary,i.e. its entries are if the value of || (cid:126)s i − (cid:126)s j || is abovea pre-determined threshold and otherwise. We do awaywith the thresholding since unthresholded RPs capture moreinformation. Images so obtained capture patterns which maynot be immediately discernible to the naked eye. A detailedprocedure for constructing a RP plot of a simple time seriesis shown in Fig. 2. Fig. 3: VAE architecture Measurements Downsampled dataUnthresholded RPFault Location PAA RPVAE + Classifier
Fig. 4: Pipeline showing DeVLearn operation
B. Variational Autoencoder
An autoencoder (AE) is an unsupervised learning techniquewhere a neural network (NN) is trained to generate outputsthat replicate its inputs [11]. Particularly, a ‘bottleneck’ inthe NN architecture is leveraged to create a lower dimensionrepresentation of the inputs in the latent space . AEs compriseof two components- a) an encoder that learns a compressedrepresentation of the input data (dimensionality reduction), andb) a decoder that learns to reconstruct input data from thecompressed representation.A VAE uses variational inference to generate the distributionof latent variables in the lower dimensional space [15]. Thedistribution of latent variables z for a given input data x follows the posterior distribution p ( z | x ) . Computing p ( z | x ) in closed form results in an intractable integral. To thisend, the variational inference method uses a different distri-bution q ( z | x, λ ) to approximately infer the computationallyntractable distribution p ( z | x ) . This is ensured by training theVAE such that the KL-divergence between the distributions q ( z | x, λ ) and p ( z | x ) is minimized. It can be shown that thedistribution q ( z | x, λ ) which approximates p ( z | x ) minimizesthe expression in (2). arg min q ( z | x,λ ) − E q ( z | x,λ ) log p ( x | z ) + KL ( q ( z | x, λ ) || p ( z )) (2)The first term in (2) is the reconstruction loss or expectednegative log-likelihood for input data x . A small value ofreconstruction loss denotes that the VAE is able to accuratelyreconstruct ˆ x from input data x . The second term is the KL-divergence between the learned distribution q λ ( z | x ) and theprior distribution p ( z ) which acts as a regularizer term. λ is thevariational parameter which indexes a family of distributions.In our case, we assume the prior distribution p ( z ) to beGaussian which makes the parameter to be mean and varianceof latent variables λ = ( µ z , σ z ) for each input data point x .Just like an AE, a VAE also consists of an encoder, decoderand a loss function. The encoder is a NN which generatesparameters λ for the distribution q ( z | x, λ ) . The decoder isanother NN trained to reconstruct the input data x from agiven latent representation z . The loss function is a weightedsum of reconstruction loss and KL-divergence terms. Choosinga weight corresponding to the reconstruction loss which issignificantly higher than the other results in overfitting ofthe VAE, whereas a higher weight for the KL-divergenceterm enforces the distribution q ( z | x, λ ) to follow the priordistribution p ( z ) . C. DeVLearn Framework
The DeVLearn framework puts the components discussedabove together to achieve a very powerful latent space repre-sentation. The pipeline of the framework is shown in fig. 4.We reiterate the steps involved, for clarity.
Step 1:
To reduce computation burden, we first downsampletime series data using Piecewise Aggregate Approximation(PAA). In this paper, we have downsampled the original signalby a factor of five.
Step 2:
The down-sampled data is converted to unthresholdedRP images following the procedure described in section II-A.We have used a delay embedding of 1.
Step 3:
A Convolution Neural Network (CNN) based deepVAE model is trained to learn the latent space distribution ofthe fault data [16]. The latent space is considered to be twodimensional. The encoder has two hidden layers while thedecoder has a single hidden layer. The structure of the deepVAE is similar to the example available in [17].As explained in section II-B, the VAE loss function has twoelements- reconstruction loss (RL) and KL-divergence term(KLD). The mean squared error (MSE) metric between inputdata x and reconstructed data ˆ x has been used for the RLterm. Since our downstream application desires well-separatedclusters in the latent space, we reduce the relative weight forthe KLD term. Potential for other downstream applicationsalso exist. VAEs have been employed to generate syntheticimages [11]. Methods of recovering original signals from un- thresholded RPs have already been proposed in literature [18].Therefore, DeVLearn may be modified to generate realisticsynthetic PMU data. This is an exciting research direction thatthe authors want to pursue in the future. Step 4:
In the latent space learned in step 3, each signal iscompressed to a single point in two-dimensional space. Thenovelty of DeVLearn is in its capability to learn a latent spacewhere measurements corresponding to different fault locationsare automatically separated into disentangled clusters, evenwhen the DL moodel has no explicit knowledge of the datalabels. Any classifier like Support Vector Machines (SVM) cannow be used to determine location of an unseen fault.III. R
ESULTS AND D ISCUSSIONS
Fig. 5: IEEE-68 bus power system with colored edges denotingthe two locations of temporary faults. The colored nodes arethe generator buses with PMU measurements.
A. Experimental Setup
In this paper, we analyse transient faults on two transmissionlines A and B in the standard IEEE-68 bus power system. Tothis end, three-phase faults are simulated for each lineand the voltage magnitudes at two generator buses (Generator1 and 4) are recorded. Location of the generator buses andfaulted lines is shown in Fig 5. The fault impedance foreach event is randomly sampled from a uniform distributionbetween 0 and 1000 Ohms. Similarly, the fault duration isassumed to be uniformly distributed over 10 to 20 cyclesof power systems frequency. The simulated events are splitinto training and testing datasets. Training and testing datasetshave 1800 and 200 events repectively. All power systemssimulations are carried out in PSS/E. DeVLearn is trainedusing the GPU hardware accelaration option available onGoogle Colaboratory. Training for a batch size of 100 tookaround 550 µ s for a single epoch.It must be mentioned here that detecting presence oftemporary faults has not been considered in the scope ofDeVLearn. Multiple methodologies have been proposed todetect temporary disturbances in literature [19]. B. Recurrent Plots for Faults
In order to better understand how generator response tofault events translate to RP images, let us look at the RPs forvoltage magnitude measurements at generator buses 1 and 4for two faults at lines A and B. Fig. 6 and 7 respectively show
20 40 60 80 100 120Time steps1.04441.04461.04481.04501.0452 V o l t a g e M a g n i t u d e ( p . u . ) Gen 1: Fault at Line AGen 1: Fault at Line B
Fig. 6: Voltage at Gen. 1 for faults at lines A and B V o l t a g e M a g n i t u d e ( p . u . ) Gen 4: Fault at Line AGen 4: Fault at Line B
Fig. 7: Voltage at Gen. 4 for faults at lines A and B
Gen. 1: Fault at Line A
Gen. 1: Fault at Line B
Gen. 4: Fault at Line A
Gen. 4: Fault at Line B
Fig. 8: Unthresholded RP images for measurements shown inFig. 6 and Fig. 7.the time series measurements (downsampled by a factor of 5)at buses 1 and 4, while Fig. 8 shows the corresponding RPimages. In these images, time progresses in a diagonal manner,from the upper left corner to the lower right. It is evidentthat RP images for the different events may be distinguished,even by the naked eye. Preliminary exploration revealed thatimages for similar events indeed look similar, even for highimpedance faults, where the voltage deviation at generatorbuses is not very high. The objective now is to teach DeVLearnto recognize the RP images and associate them with the eventsthey correspond to.
C. Training the VAE
The deep VAE component of DeVLearn is trained using1800 instances of × grayscale images. Each trainingepoch uses a batch size of 100 data points. A separateDeVLearn framework is trained for each of the generators,but the VAE architecture and loss functions were not altered.The encoder projects the training set into a compressed twodimensional latent space, whose evolution with training epochsis shown in Fig. 9. It can be clearly seen that data from differ- ent faults start separating into clusters as training progresses,and significant separation is achieved at 500 epochs from boththe models. Therefore, we are able to form an estimate of faultlocation using only local voltage magnitude measurements. D. Determining Fault Location
We check the performance of a SVM classifier with linearkernel on the latent space learnt by the DeVLearn frameworkafter 1000 training epochs. The resultant decision boundary isshown in Fig. 10. It is evident that in the latent space, faultdata for two lines are almost linearly separable. With a linearSVM classifier, we obtain a training accuracy of 99.33% and99.72% for generator 1 and 4 respectively. Testing accuracyfor both generators is 99.5%. Although a classifier with anon-linear kernel (Radial Basis Function or RBF kernel, forinstance) would have achieved higher accuracy, we intend toshow that sophisticated classifiers are not required to achievegood performance. IV. C
ONCLUSION
This paper provides a proof of concept that image em-bedding aided deep learning may be used to determine tem-porary fault locations in power systems with high accuracy.We demonstrate the capability of the proposed frameworkDeVLearn in learning useful information from unlabeled uni-variate time series data in the context of distinguishing faultsat different locations. This is lucrative, keeping in mind thelimited availability of labeled data in the power domain. Re-search scope exists in devising image embedding strategies formultivariate time series data. Of course, more tests with faultsat different lines, network topologies and operating conditionsare required to place higher confidence in DeVLearn, and thisis a direction that the authors are pursuing. The idea is tovalidate the DeVLearn framework with actual PMU data andexpand it to applications beyond fault localizing, for examplegenerating realistic synthetic PMU data.R
EFERENCES[1] R. J. Hamidi and H. Livani, “Traveling-wave-based fault-location algo-rithm for hybrid multiterminal circuits,”
IEEE Transactions on PowerDelivery , vol. 32, no. 1, pp. 135–144, Feb 2017.[2] Q. Jiang, B. Wang, and X. Li, “An efficient pmu-based fault-locationtechnique for multiterminal transmission lines,”
IEEE Transactions onPower Delivery , vol. 29, no. 4, pp. 1675–1682, 2014.[3] H.-W. Lee, J. Zhang, and E. Modiano, “Data-driven localization andestimation of disturbance in the interconnected power system,” 2018.[4] W. Li, D. Deka, M. Chertkov, and M. Wang, “Real-timefault localization in power grids with convolutional neuralnetworks,”
CoRR , vol. abs/1810.05247, 2018. [Online]. Available:http://arxiv.org/abs/1810.05247[5] Y. Zhu, C. Liu, and K. Sun, “Image Embedding of PMU Data forDeep Learning Towards Transient Disturbance Classification,” in
IEEEInternational Conference on Energy Internet (ICEI) , May 2018, pp. 169–174.[6] J.-M. Hidalgo-Arteaga, F. Hancharou, F. Thams, and S. Chatzi-vasileiadis, “Deep Learning for Power System Security Assessment,”in
IEEE Milan PowerTech , June 2019, pp. 1–6.[7] S. Ryu, J. Noh, and H. Kim, “Deep Neural Network Based Demand SideShort Term Load Forecasting,” in , Nov 2016, pp. 308–313.[8] D. P. Kingma and M. Welling, “An Introduction to Variational Autoen-coders,”
ArXiv , vol. abs/1906.02691, June 2019. a) (b) (c)(d) (e) (f)
Fig. 9: Evolution of latent attribute distribution in the compressed two dimensional space over different training epochs. Figs. 9a-9c shows the evolution of the latent space when the DeVLearn framework is trained with voltage magnitude measurements atgenerator bus 1. The distribution of latent space attributes for measurements at generator bus 4 is shown in Figs. 9d-9f. It canbe seen that as training epoch progress, measurements corresponding to faults at different locations separate into discernibleclusters in the latent space. (a) Classifying measurementsfrom Gen. bus 1: training data (b) Classifying measurementsfrom Gen. bus 1: testing data(c) Classifying measurementsfrom Gen. bus 4: training data (d) Classifying measurementsfrom Gen. bus 4: testing data
Fig. 10: Using a SVM classifier with linear kernel to classifylatent space learned by DeVLearn [9] N. Marwan, M. C. Romano, M. Thiel, and J. Kurths, “Recurrence Plotsfor the Analysis of Complex Systems,”
Physics Reports , vol. 438, no. 5,pp. 237–329, Jan 2007.[10] N. Hatami, Y. Gavet, and J. Debayle, “Classification of Time-SeriesImages Using Deep Convolutional Neural Networks,”
ArXiv , vol.abs/1710.00886, 2017.[11] I. Goodfellow, Y. Bengio, and A. Courville,
Deep Learning
PhysicalReview E , vol. 97, no. 6, Jun 2018.[14] Z. Wang and T. Oates, “Imaging Time-series to Improve Classificationand Imputation,” in
Proceedings of the 24th International Conferenceon Artificial Intelligence , 2015, pp. 3939–3945.[15] Q. Xu, Y. Yang, Z. Wu, and L. Zhang, “Different latent variableslearning in variational autoencoder,” in ,July 2017, pp. 508–511.[16] D. Hafner, “Building Variational Auto-Encoders in TensorFlow,” Blogpost, 2018. [Online]. Available: https://danijar.com/building-variational-auto-encoders-in-tensorflow/[17] Keras, “CNN based Variational Autoencoder Example,” GitHubrepository, 2013. [Online]. Available: https://github.com/keras-team/keras/blob/master/examples/variational autoencoder deconv.py[18] A. Sipers, P. Borm, and R. Peeters, “On the Unique Reconstruction of aSignal from its Unthresholded Recurrence Plot,”
Physics Letters A , vol.375, no. 24, pp. 2309 – 2321, 2011.[19] M. J. B. Reddy, R. K. Raghupathy, K. Venkatesh, and D. Mo-hanta, “Power Quality Analysis using Discrete Orthogonal S-transform(DOST),”