An Online Power System Stability Monitoring System Using ...static.tongtianta.site/paper_pdf/d49b74a2-47a0-11e9-b852-00163e08bb86.pdfgenerator parameter variations and measurement

864 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 34, NO. 2, MARCH 2019

An Online Power System Stability MonitoringSystem Using Convolutional Neural Networks

Ankita Gupta, Gurunath Gurrala , Senior Member, IEEE, and P. S. Sastry, Senior Member, IEEE

Abstract—A continuous Online Monitoring System (OMS) forpower system stability based on Phasor Measurements (PMU mea-surements) at all the generator buses is proposed in this paper.Unlike the state-of-the-art methods, the proposed OMS does notrequire information about fault clearance. This paper proposes aconvolutional neural network, whose input is the heatmap repre-sentation of the measurements, for instability prediction. Throughextensive simulations on standard IEEE 118-bus and IEEE 145-bussystems, the effectiveness of the proposed OMS is demonstrated un-der varying loading conditions, fault scenarios, topology changes,and generator parameter variations. Two different methods arealso proposed to identify the set of critical generators that are mostimpacted in the unstable cases.

Index Terms—Transient stability, phasor measurements, convo-lutional neural networks, principal component analysis.

I. INTRODUCTION

MONITORING the rotor angle stability and early recog-nition of the potentially dangerous conditions is very

crucial for reliable operation of power systems. Various tech-niques have been traditionally used to assess the rotor anglestability. Time domain simulations (TDS) rely on solving non-linear differential algebraic equations (DAE) that model powersystems [1]. But they are computationally intensive and requireaccurate system data. Transient-energy-function (TEF) meth-ods [2], compare the potential and kinetic energy of the systemagainst a reference value. However, there are difficulties in esti-mating these energies in practical scenarios due to unavailabilityof some state variable measurements [2], [3]. Equal area crite-ria (EAC) and extended equal area criterion assess the transientstability based on a single machine connected to infinite bus(SMIB) model approximations [4], [5]. But they allow only theclassical generator models. The SIME (SIngle Machine Equiv-alent) Method [6], is a hybrid approach which combines theadvantages of TDS and EEAC and allows use of detailed mod-els. This method is computationally more efficient than TDS butat the cost of reduced accuracy.

Manuscript received August 4, 2017; revised March 1, 2018 and June 1,2018; accepted September 22, 2018. Date of publication October 9, 2018;date of current version February 18, 2019. This work was supported by Fundfor Improvement of S&T Infrastructure program, Department of Science andTechnology, India, No.SR/FST/ETII-063/2015(C) & (G). Paper no. TPWRS-01200-2017. (Corresponding author: Gurunath Gurrala.)

The authors are with the Department of Electrical Engineering, IndianInstitute of Science, Bengaluru 560012, India (e-mail:, [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TPWRS.2018.2872505

In recent years, many machine learning (ML) techniques havebeen explored for many problems in power system includingfault detection and classification, stability and control problems[7]–[15]. In the past few years many SVM-based methods areproposed for transient stability prediction with good amountof success [3], [16]–[18]. Compared to the other ML methods(such as decision trees, multilayer perceptrons or rule-basedmethods), SVM based methods are seen to be more effective forassessing transient stability [18]. However, all theses classifiersneed to use post-fault values of system variables as input whichmeans that they need accurate information about the instantof fault clearance. Recently, in [19], the authors have shownthat the performance of the SVM proposed in [3] degradessignificantly when tested with data measurements delayed byeven one or two samples with respect to the fault clearanceand also under various noise conditions. Thus, there is a needfor a more robust system whose performance is independent ofaccurate fault information.

In this paper a continuous Online Monitoring System (OMS)is proposed for assessment of rotor angle stability in a powersystem. The OMS uses voltage magnitudes and voltage anglesat all the generator buses as inputs, which are usually availablefrom PMU measurements. The system is based on a convolu-tional neural network (CNN) classifier and the input to CNN is aheatmap1 representation of the measured values of system vari-ables in the current time window. To the best of our knowledge,this is the first time that a heatmap representation and deep neu-ral network techniques are used for this problem. The proposedOMS continuously analyzes data measurements in overlappingtime windows and it does not need knowledge of the instantof fault occurrence or its clearance. In case of instability, theOMS further identifies the generator(s) most impacted by thedisturbance, called as critical generators. Two different meth-ods are proposed for identifying these critical generator(s). Firstmethod is based on Principal Component Analysis (PCA) of theoutput of the CNN. The second method uses a second neuralnetwork whose input is the representation learnt by CNN, topredict the entire set of critical generators that cause instability.The performance of the proposed OMS is validated by extensivesimulations on IEEE 118-bus [20] and IEEE 145-bus [21] testsystems under various loading conditions, topology changes,generator parameter variations and measurement noise.

1A heatmap is a representation of data as an image where each value in thedata matrix is represented as color intensity.

0885-8950 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-9327-0565

mailto:[email protected]




GUPTA et al.: ONLINE POWER SYSTEM STABILITY MONITORING SYSTEM USING CONVOLUTIONAL NEURAL NETWORKS 865

Fig. 1. General layout of OMS.

II. CONTINUOUS ONLINE MONITORING SYSTEM

The proposed OMS has three main components, an inputstage, a deep neural network (DNN) based classifier and crit-ical generator identifier. Fig. 1 shows a general layout of theproposed OMS.

A. Input Stage

The OMS assumes availability of PMU measurements of allthe generator buses at a central location. It uses voltage mag-nitudes and voltage phase angles of all the generator buses. Inaddition, the rate of change of angle is used as the third variableand will be referred hereafter as frequency. Before calculatingderivative of phase angles, the sampled phase angle signal ispassed through a median filter with kernel size three. A medianfilter replaces each entry in the input signal with the median ofneighboring entries covered by the kernel. As derivatives am-plify noise in a signal, filtering before calculating derivativeshelps in reducing the effect of noise. The training data for theCNN is obtained through a detailed simulation of the powersystem. In the simulator, all the phase angles are required tobe referred to a common reference. This reference cannot bedefined based on any particular generator, because loss of thatgenerator or any instability in that generator makes the relativeangles meaningless. To make the OMS independent of systemparameters center of inertia is not considered. An average of allthe phase angles is used as reference as shown below:

φRef,k =∑G

i=1 φi,k

G(1)

where, φi,k is the kth sample of phase angle corresponding to i thgenerator and G is the total number of generators in the systemunder consideration. If the synchronized PMU measurementsare available then this step is not required.

Since disturbances occur at random times and no knowledgeabout the faults is assumed, the OMS has to continuously keepanalyzing the power system variables over a few cycles. Atany given instant, the OMS observes all the variables over asliding window of, say, s cycles. It is assumed that the PMUsprovide one measurement per cycle. At every sampling instant,the sliding window consists of (s − 1) past measurements and1 current measurement.

1) Heatmaps: If G is the number of generators, the mea-sured data for each window of size s is a G × s matrix with

Fig. 2. Heatmap for stable cases.

Fig. 3. Heatmap for unstable cases.

each element of the matrix being a vector of three values. Byconsidering these three values as the R-G-B channels (colourintensities), the data of a window can be rendered as a colourimage of size G × s. This image is called the heatmap of themeasurements or the data matrix. (One can use the python com-mand “imshow” to convert the data into a heatmap).

Fig. 2 and Fig. 3 show the heatmaps of two representativestable and unstable cases for IEEE 118 Bus System. Here the Y-axis corresponds to the generators and the X-axis to the samples.The figures show the heatmaps that depict the temporal profilesof voltage magnitude, phase angles and rate of change of phaseangles for 1000 samples (20 seconds). It can be observed thatthe heatmaps of stable cases are quite similar and they are quitedifferent from those of the unstable cases. It may be noted thatthis visual similarity is noticeable only when the data matrix istaken over a longer observation period. It is not easy to preciselystate in words what this similarity is. The OMS has to distinguishbetween the cases based on only a few tens of millisecondsof the data. The visual dissimilarity of heatmaps of stable andunstable cases provides good motivation for using CNNs, whichare currently the most successful ML method for image-basedpattern recognition. The CNNs can be trained to identify thesevisual patterns.

B. Development of CNN Based Classifier

1) Convolutional Neural Networks: Convolutional NeuralNetworks (CNNs) [22] are a family of neural networks spe-cialized for processing data that has grid like topology, like timeseries (1D grid of samples at regular time intervals) and images(2D grid of pixels). CNNs use local connectivity to efficientlylearn useful features from image data and are highly successfulin many image classification tasks [22], [23].

2) Basic CNN Components: A typical convolutional neu-ral network for image recognition is shown in Fig. 4. The in-put to the network is an image. The input layer is followedby convolutional layers. Each unit in a convolutional layer isconnected only to a few neighbourhood neurons in the pre-vious layer allowing for local connectivity [22]. The neuronsin the first convolutional layer are meant for extracting some


Fig. 4. Architecture of a standard CNN.

elementary features. These features are then combined by higherlayers to form complex features. In order to incorporate invari-ance to translations or distortions, all neurons in a convolutionallayer share the same weights. This causes the same elementaryfeature detector to be applied across entire input image which isequivalent to convolution operation with a suitable-sized kernel.The output of such a set of units is called as feature map and thecommon weight vector is called a filter. Units in a feature mapextract same features across the entire image. To extract multiplefeatures, a convolutional layer is composed of several featuremaps. Mathematically, the feature value at (i, j)th location inthe kth feature map of lth layer, ml

i, j,k , is given by,

mli, j,k = wl

kT

xli, j + bl

k (2)

where, wlk and bl

k are weight vector and bias of the kth filter of lthlayer. xl

i, j is input patch centered at (i, j)th location (representedas a vector). Here, the kernel wl

k that generates the feature map,ml

i, j,k is same for all i, j . This sharing of weights reduces modelcomplexity and makes training easier. This feature map is thenpassed through a nonlinear activation function. The activational

i, j,k of feature mli, j,k is given by,

ali, j,k = a(ml

i, j,k) (3)

where a(.) is the activation function (The ali, j,k become input to

the next layer). Some popular activation functions are sigmoid,tanh [24] and ReLU [25]. Each convolutional layer may be fol-lowed by a pooling layers which perform subsampling or localaveraging to reduce the sensitivity of output to shifts and distor-tions. After several convolutional and pooling layers, the inputimage gets transformed into a more suitable representation. Thisis then input to a few fully connected layers which act as a finalclassifier. The final output of the last layer of the entire networkis usually obtained through a softmax operator [23] as follows:

f j (z) = exp z j∑

k exp zk(4)

where f j is the output of the j th neuron in the output layer andz is a vector (with components zi ) of real valued scores which isthe input to the final layer. The function f (.), with componentsf j (.), is called the softmax function. The vector f(z) has everycomponent between 0 & 1 and its components sums to one. Asthese scores are normalized, a higher score for a class denotesmore confidence in that class. In this sense, these scores can beviewed as estimates of probabilities for the classes. Thus, theoutput of softmax layer can be viewed as the probabilities for

the different classes and hence useful for classification tasks.However, it may be noted that these are not the true posteriorprobabilities and are useful in only deciding the most probableclass for each input pattern. Theoretically, only if the gradi-ent descent algorithm (specifically Adam optimizer) gives theglobal optimum of the empirical risk and the capacity of CNNis sufficient, these scores would be a good approximation of thetrue posterior probabilities. Thus, one cannot take the outputof CNN as the actual probability of instability. The exact CNNarchitecture used as OMS is described in Section III-A.

3) Learning CNN Parameters: Let θ denote the vector of allthe parameters (weights and biases) of a CNN. Consider n datasamples denoted as {(xi , yi ) : i = 1, 2...n}. The optimal param-eters are obtained by minimization of empirical risk defined by:

£(θ ) = 1

n

n∑

i=1

l(yi , oi (xi )|θ ) (5)

Here, yi is the true or desired output, oi (xi ) is the outputobtained by CNN on the input vector xi and l(yi , oi (xi )|θ ) is theloss for i th sample (given the parameters θ ). For a classificationproblem, l(.) is usually taken as cross entropy loss [26].

Gradient descent and its variations [27]–[30] are commonlyused for minimizing the empirical risk. Gradient descent updatesweights as follows,

θt+1 = θt − ηg(θt ) (6)

where θt denotes values of parameters at time-step t , g(θt ) =�θ£(θt ) is the gradient of £(θ ) with respect to parameters θ

at t and η is the learning rate or step size. The gradient iscomputed efficiently using the so called error backpropagationalgorithm [26].

There are many methods to speed-up this gradient descentand in this paper, Adam optimizer as proposed in [30] is used.This algorithm uses an adaptively determined step-size in thegradient descent. The step-size to be used at each iteration isdetermined using estimated first and second moments of thegradient vector [30]. Let gt = g(θt ) and let g2

t denote the vectorwhose components are squares of the components of gt . Letθ i

t denote the i th parameter value at time t . Then the Adamalgorithm obtains estimates of first (mt ) and second (νt ) ordermoments of the gradient as,

mt+1 = β1mt + (1 − β1)gt (7)

νt+1 = β2νt + (1 − β2)g2t (8)

where β1, β2 are user-defined constants and the estimates mt , νt

are initialized by zero vectors. These estimated moments arethen used to update the parameters at time t as

θ it+1 = θ i

t − η√

νit + ε

mit (9)

where mit , ν

it are the i th components of the corresponding

vectors.4) Preparation of Training Data: For training and testing the

CNN, the temporal profiles of the voltage magnitudes, phase an-gles and rate of change of phase angles are obtained using offlinedynamic simulations code developed in MATLAB [20]. The


widely used IEEE 118-Bus system and IEEE 145-Bus systemare used to generate the profiles. The 118-Bus system comprises118 buses, 19 generating units, 91 loads, and 177 transmissionlines [20] whereas 145-Bus system comprises 145 buses, 50generating units, 60 loads, and 453 transmission lines [31].

In the simulations, three-phase-to-ground faults on each busas well as on each transmission line at three locations (at 25%,50%, and 75% of the length) are created. Clearing time for all thecontingencies is randomly picked from 4–8 cycles. The abovecontingencies were repeated for converged load flow cases atfour different loading levels (base load plus 1%, 5%, 7%, and10%). The MATLAB simulator uses partitioned approach withRK-4 numerical method [20]. 5 ms time-step is used for solv-ing the differential equations. A sampling frequency of 50 Hzis used to obtain synchronously sampled measurements. Eachsimulation is carried out for a time duration of 1000 cycles (or20s). For labelling the temporal profiles as stable and unsta-ble cases and identifying the critical generators, the transientstability index is used.

Transient Stability Index (TSI): Let a transient disturbanceoccur at time instant tF and be cleared at tC . The system variablesare observed until a later time tM > tC . Then, theoretically, thestability status of the system, post-contingency, is obtained usingTransient Stability Index (TSI), η, defined as [3].

η = 360◦ − |�δ|max

360◦ + |�δ|max(10)

where |�δ|max is the absolute value of the maximum rotor angleof separation between any two generators during the post-faultperiod, {t : tF < t < tM}. System is considered as stable if η

> 0 ; otherwise, the system is transiently unstable. The systemprofiles obtained through the simulator are labelled as stableor unstable based on the η value. For an unstable case, thegenerators having separation of more than 360◦ from the rest ofthe generators form the set of critical generators (CGs) for thatprofile. The generator that separates at the earliest among theCGs is labelled as the most critical generator.

Let ( Xi , Yi , Si ), i = 1, . . . , n, be the dataset where Xi are thetemporal profiles generated by the simulator, Yi is 0/1 dependingon sign of η and Si is the set of CGs. Si is empty for stable cases.Each Xi is a time series of length 1000, where each elementis a 3G-dimensional vector (consisting of the magnitude, phaseangle and derivative of phase angle of voltages of G generators).

As mentioned earlier, the input to the CNN is a time windowof size s cycles. Windows of size s, with stride of 1 sample, overthe entire time horizon are considered. A window is specifiedas w = [tS , tE ], where tS and tE are the start and end times ofthe window. While the final objective is to predict instabilityby monitoring such successive windows, the CNN has to givea classification label for each window. Here a CNN is designedsuch that it classifies each window into one of 5 classes whichare shown in Table I. The idea is that the CNN would be trainedto classify each time window based on what is happening to thesystem during that time window. During its operation (that is, fortest cases) the sequence of class labels output by the CNN wouldbe converted into a prediction on instability as explained later.To train the CNN, every window obtained from the measured

TABLE IPOSSIBLE SYSTEM STATES AS LABELS FOR WINDOWS

voltage profiles of the training data need to be labelled. This isexplained below.

All the windows in the stable pre-fault operation period wouldbe labelled as Class 0. If the window covers the instant offault occurrence, it would be labelled as Class 1. If a windowhas all samples after the fault occurrence but before the faultclearance, it would be labelled as Class 2. If a window coversthe instant of fault clearance, it would be labelled as Class 3. Inthe current profile Xi , if the system is stable after fault clearance,all windows after fault clearance are again categorized as Class0; otherwise, all the windows after fault clearance would belabelled as Class 4. For an unstable profile Xi , every windowlabelled as Class 4 is also associated with the set of CGs as Si .All other windows are assigned an empty Si . In training data ofvoltage profiles obtained through simulation, the instant of faultonset, fault clearance etc. is known and this information wouldbe used for labelling the windows. Note that this is needed onlyin the training data. Let tF and tC denote the onset and clearancetimes of a fault in a profile. As per the above description, awindow w = [ tS , tE ], is labelled as yw:

yw =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

0 (Class 0) if (tE � tF ) or (tS ≥ tC and η > 0)

1 (Class 1) if (tS � tF � tE )

2 (Class 2) if (tF � tS � tE � tC )

3 (Class 3) if (tF � tS � tC � tE )

4 (Class 4) if (tS ≥ tC and η < 0)(11)

The training data for the CNN consists of the windows (asheatmaps) with these labels and the set of critical generators.

5) The Final Output of OMS: As explained earlier, the out-put layer of CNN is a softmax layer which gives a normalizedscore for the five classes for each window. This output has tobe converted into a prediction of instability (or an alarm). Analarm indicating an imminent instability is raised if the normal-ized score for Class-4 in the softmax layer output is above acertain threshold, τ (confidence level), for N consecutive win-dows (wait period). This strategy is used (rather than raise analarm the moment a window is classified as class-4) because,in a system that predicts instability, reducing false alarms isvery important. The hyperparameters, τ and N can be decidedthrough cross-validation [32]. In this paper, N = 15 and τ = 0.9are used as they resulted in low false alarm and missed detectionrates. The performance of OMS with variation of these hyperpa-rameter values is also investigated in the results section. Whenthe OMS predicts an alert state, it also predicts the correspond-ing set of CGs. Two methods, one based on PCA and another


Fig. 5. Y-Net architecture.

using another neural network, referred as Y-Net, are proposedfor this in the following sections. PCA identifies the most criticalgenerator while Y-Net identifies the complete set of CGs.

C. Critical Generator Identification

The information of critical generators is useful for the oper-ator for planning mitigation strategies to arrest the propagationof the instability in an interconnected system.

1) PCA Based Method: Consider the data between the firstinstant, say, tF , when CNN predicts class-1 (which is onset of afault) and the instant, say, tA, when CNN first predicts Class-4or an alert state. Since the critical generator(s) is one that isgoing out of synchronism with the rest, one should be lookingfor some ‘outliers’ based on this data. This is done using PCAas follows.

The measurements for all the generators over the time intervaltF to tA can be viewed as a data matrix of dimension G × 3NAF

where NAF denotes the number of samples between tF and tA.Each row in this matrix is a vector of measurements pertainingto one generator. Using principal component analysis (PCA),the first two principal directions are obtained. Then each of theG data vectors are projected onto the two dimensional spacespanned by the first two principal directions. Now the objectiveis to determine an outlier among these vectors.

A two dimensional gaussian density is fitted to the projecteddata with its parameters estimated by Sample Mean and Min-imum Covariance Determinant (MCD) [33]. MCD is a robustestimator of covariance matrix, which is not affected much byoutliers. The generator corresponding to the data vector withminimum likelihood under the estimated density is identified asthe outlier and hence as the most critical generator.

2) Y-NET Architecture: The PCA based approach automat-ically identifies the most critical generator. In addition, whenthe projected vectors corresponding to all the generators arevisualized in 2-D, manually (through visual inspection) all theoutliers may be detected. In order to detect all CGs automati-cally, a neural network is proposed, referred as Y-Net, whosearchitecture is shown in Fig. 5. It is an integrated frameworkthat can detect instability in the power system as well as theset of CGs that are responsible for such instability. As can beobserved from Fig. 5, in the Y-net architecture the output of thelast convolutional layer of the CNN is fed to a second classifier

(Classifier-2) to identify the set of CGs apart from being inputto the classifier discussed in Section II-B2. The idea is that therepresentation learnt by feature extractor part of the CNN (thatis, the convolutional layers of the CNN) should be useful bothfor predicting the state (or class as in Table I) of the currentwindow as well as for identifying the generators which are mostaffected in case of instability.

The classifier-2 in the Y-net is a multi-label classifier. Multi-label classification is a classification problem where an inputdata sample (a window, in this case) can have more than oneclass label. In an unstable case, a given window can have mul-tiple CGs and hence if the generator bus numbers are taken aslabels, the problem of mapping a window to its correspondingCGs would be a multi-label classification task. This propertymakes classifier-2 different from classifier-1, where an inputwindow can map to only one of the five possible class labels.Classifier-2 has G + 1 number of nodes in its output layer. Incase of instability, any subset of the G generators may be loosingsynchronism; in addition, one more label is needed to identifythe case of no instability. Unlike classifier-1 which has a soft-max output layer, in the classifier-2 network, each of the outputnodes would have a sigmoidal activation function and thus allthe output would be between 0 and 1. All the nodes with outputgreater than 0.5 would be considered as the predicted criticalgenerators for the given input. This network is also trained usingbackpropagation. The desired output for any window would bea binary vector with a 1 corresponding to only those generatorsthat are in the set of CGs for that window.

In order to train the Y-Net as an integrated network, bothclassifier-1 and classifier-2 should be trained simultaneously,i.e., the errors of both the classifiers should be backpropagatedat the same time. To simplify the training process, for everybatch of training data, classifier-1 and classifier-2 are trainedalternately. This makes the training faster and simpler withoutaffecting the final performance much.

D. Assessing OMS Performance

One of the outputs of OMS is instability prediction whichcan be assessed using False Alarm (FA) and Missed Detection(MD) rates. The other output of OMS is the set of CGs whichcan be assessed using jaccard similarity. The jaccard similarityJ (S1, S2) between two sets S1 and S2 is defined as,

J (S1, S2) = |S1 ∩ S2||S1 ∪ S2| (12)

Here, S1 is the set of actual CGs obtained using TSI and S2 isthe set of CGs predicted by classifier 2 of Y-Net. This measureshould be close to 1 for every unstable case. The percentageof unstable cases for which J (S1, S2) is exactly equal to 1 isreported as JS accuracy.

The time when the OMS predicts class-4 with 0.9 probabilityfor the first time in an unstable profile relative to the fault clear-ance is also used as a measure, which is termed as ζ . A measureto quantify how early the OMS can predict the instability whencompared to the TSI is also used. This is the so called averagelatency of prediction, ϒ , proposed in [19]. This is the difference


between the instant of alert state prediction by the OMS andthe time instant where the instability could be predicted by TSI(η), averaged over all test profiles. In the results presented in thenext section, all these measures are reported.

III. RESULTS AND DISCUSSIONS

A. CNN Architecture

A CNN architecture with two convolutional layers as featureextractor followed by two fully-connected layers for classifier1 and three fully-connected layers for classifier 2 is consideredas shown in Fig. 5. The implementation is done in TensorFlow0.8 with NVIDIA GeForce Titan X, 12 GB GPU (CUDA 7.5)support. The network is trained with Adam Optimizer [30] (withdefault Adam adaptive learning rate) with a batch size of 64. Theinput to CNN are the heatmaps of windows. The window sizeis set as s = 5.

For the IEEE 118-bus system, number of generators (G) are19. Thus, the input layer of CNN has dimensions of 19 × 5 × 3.The first convolutional layer filters the input image with 32 ker-nels of size 19 × 3 × 3 with a stride of 1 pixel. In standardCNN applications, the image size is large (e.g. 512 × 512 × 3,1024 × 1024 × 3) and the kernel size is small in comparison tothe image size (e.g., 3 × 3 × 3, 5 × 5 × 3). This is because,in an image based object detection task, the algorithm looks forlocal features (e.g., edges) which may be present at many dif-ferent locations in the image. However, here comparable imagesize (19 × 5 × 3) and kernel size (19 × 3 × 3) are selectedbecause one need to look for global features in the heatmap of astable/unstable case. Since the size of the input image to CNNis small, having a kernel size comparable to the input size doesnot increase the computational burden.

The output of first convolutional layer with size 32 × 3 × 1is reshaped across time axis to 3 × 32. The input to the secondconvolutional layer is the reshaped output of first convolutionallayer. The second layer filters it with 16 kernels of size 3 × 3.The output of second convolutional layer is then fed to classi-fier 1 consisting of two fully-connected layers with 32 and 5neurons respectively. The last layer with 5 neurons is connectedto the softmax output layer. The output of second convolutionallayer is also fed to classifier 2 which has three fully-connectedlayers with 256, 64 and 20 neurons respectively. The outputlayer with 20 neurons uses sigmoidal activation function. InFig. 5, on the convolutional layers, the numbers shown out-side the braces correspond to the network used for the 118 bussystem.

For the 145-bus system, the number of generators (G) are50 and with window size s = 5, the input dimensions are 50 ×5 × 3. The first convolutional layer filters the input image with64 kernels of size 50 × 3 × 3 with a stride of 1 pixel. The outputof this layer with size 64 × 3 × 1 is reshaped across time axisto 3 × 64. The second convolutional layer takes the reshapedvalues and filters it with 32 kernels of size 3 × 3. The classifier 1and classifier 2 architectures are same as those for the 118 Bussystem except the classifier 2 output layer which has 51 nodeshere. In Fig. 5, the numbers inside the braces correspond to the145-bus system.

B. The Simulation Set-up

For the 118-bus system, a total of 26,000 profiles are gener-ated as described in Section II-B4 out of which 2309 are detectedas unstable based on TSI (η). Similarly, for the 145-bus system,a total of 33,073 profiles are generated out of which 10099 areunstable based on η. The dataset of each system is randomly splitinto training and test sets 10 times and all results shown are aver-ages over these trials. Every time, 450 unstable cases and 4550stable cases are chosen for the test set for the 118-bus systemand 450 unstable cases and 3250 stable cases are chosen to formthe test set for the 145-bus system. The remaining data is usedas training set in each case. From the training datasets, overlap-ping windows (heatmaps) along with corresponding class labelsand set of CGs are generated as described in Section II-B4. Itis found that the number of windows of Class 0 would be muchlarger than those of any other classes in most of the simulations.Hence, subsampling without replacement is used such that allclasses are equally represented in the training dataset. 15000heatmaps for training and 750 heatmaps for validation are usedfor the 118 bus system while 16000 heatmaps for training and4525 for validation are used for the 145 bus system.

C. OMS Performance Results

The performance of OMS is tested under zero noise andbase case topology. The OMS trained with base topology, isalso tested on new data created under N − 1 transmission linecontingency scenarios for all the fault conditions discussed inSection II-B4. The trained OMS is also tested with new datasetscreated by considering ±5 % uniform random variations simul-taneously in all parameters for all the generators. The perfor-mance of OMS is also tested under varying measurement noiselevels. For this, ±1% to ±3% uniform random noise is added toboth the training as well as test data in the base case. This noiseis added in every measurement at every sampling instant and isreferred as Type I noise. The OMS trained on base case undernoise conditions is also tested on data for topology changes andparameter variations under similar noise conditions.

Various performance measures described in Section II-D,namely, the false alarm (FA) rate, missed detection (MD) rate,ζ , the average latency of prediction ϒ , and the percentage ofunstable profiles that have jaccard similarity exactly equal to1 (JS acc) are reported for 118 and 145 bus systems in Table II.For the 118-bus system, it can be observed that with the basetopology and zero noise, the FA and MD rates are below 0.5%.The FA and MD rates increased slightly with the increasingnoise levels. JS accuracy was not affected much in the presenceof measurement noise; however, slight increase in ζ and slightdecrease in ϒ can be observed. The JS accuracy is more than85% for all the cases which means that, for approximately 85%of unstable cases, the exact set of CGs was identified correctly.The ζ is found to be less than 7.4 cycles for all cases. It meansthat the instability can be predicted (in the sense of labelling thewindow as class 4) with high confidence within 7.4 cycles afterthe fault clearance. (Note that the OMS has no knowledge offault onset or clearance times). The ϒ for all the cases is foundto be greater than 25 cycles which means the OMS declares


TABLE IIY-NET BASED OMS PERFORMANCE

Fig. 6. Sensitivity analysis w.r.t hyperparameters τ and N .

possible instability to the operator at least 25 cycles before thesystem actually becomes unstable. Before declaring the insta-bility, the OMS is configured to wait for N = 15 cycles from theinstance it labels a window Class 4 as discussed in Section II-B4.This wait period is essentially introduced to reduce FA rate. Itmeans in all the above cases, the Class 4 status is detected atleast[25 (ϒ) + 15 (wait period)] cycles before the system actuallybecomes unstable. This wait period can be changed as per op-erational requirements of the system. Fig. 6 a and Fig. 6 b show

Fig. 7. Robustness of OMS: Type II noise.

how the FA and MD rates vary with changes in hyperparametersN (wait period) and τ (confidence level). As can be seen fromthe figures, varying N from 0 to 30 can change MD rate byabout 2% and FA rate by about 10%. The choice of N = 15 isa good compromise to reduce FA rate.

WithN − 1 topology changes, the FA and MD rates increasedslightly with the same trend when compared to the base case.The ζ , ϒ and JS accuracy decreased with the same trend as thebase case. With generator parameter changes the FA and MDrates are slightly higher than the base case. However, ζ and ϒ

are reduced further with not much difference in JS accuracy.The trend remains the same as base case with increasing noiselevels.

Similar performance trends can be observed for 145-bus sys-tem. From the two systems under study, it is observed that theperformance measures ζ , ϒ and JS accuracy are system specificand FA and MD rates do not change significantly with varioussystem configurations which is a desirable feature for a stabilitymonitoring system. A significant difference can be seen in ϒ for145-bus system as compared to 118-bus system. This is because,the critical clearing time of 118-Bus system is found to be muchlower than 145 bus system. Hence, instability manifestation in145 Bus system is much slower than that for 118 Bus system.

Another type of noise, referred as Type II noise is also con-sidered, where at any given time instant each generator hasprobability 0.1 of its measured values being noisy and whenthey are, the noise is uniform with rate varying between ±1%and ±3%. This kind of noise reflects abrupt spikes in the mea-surements which is more realistic than uniform noise. Fig. 7shows the FA and MD rates under varying levels of noise intraining and testing data for 118-bus and 145-bus system forbase topology. In these Figures, the y-axis represent FA andMD rates in percentage and x-axis represent % noise in trainingdata. The results are shown for ±1% , ±2% and ±3% noise intest data for each noise level in training data. It can be observedthat for both 118 and 145 bus system, the FA rates increaseslightly for Type II noise as compared to Type I noise. However,there is not much change in MD rates.


TABLE IIIY-NET BASED OMS PERFORMANCE WITH MISSING PMU MEASUREMENTS

The SVM method in [3] needs knowledge of exact fault clear-ance time. As reported in [19], for the SVM proposed in [3] theFA and MD rates become 5.8% and 18.83% respectively when arandom delay of upto ±2 samples is introduced in the fault clear-ance time. If noise is introduced in test data FA and MD rates ofthe SVM method become as high as 12.64% and 29.3%; if noiseis introduced in both the training and testing data the FA and MDrates become 4.31% and 47.83% [19]. The proposed OMS doesnot need any fault information since it continuously monitorsthe system. Also, as discussed above, the performance of OMSis very robust to noise in measurements, topology changes etc.

For the OMS, the availability of PMU measurements atall generator buses is assumed. Hence an interesting ques-tion is what happens if some PMU measurements are missing?Table III shows the performance of OMS under different sce-narios of missing measurements. It is assumed that all measure-ments are available during training. However at test time (that is,during the actual operation of OMS) following three scenariosare considered: (i). measurements for random 10% of generators(145 Bus System : 5 Gens and 118 Bus System : 2 Gens) aremissing with 0.1 probability at every time instance; (ii). mea-surements for any one of the stable generators (whether the caseis actually stable/unstable) is missing at all time instances; (iii).measurement for any one of the unstable generators is missingat all time instances. As can be seen from the table, the per-formance of the OMS is quite robust with respect to missingmeasurements.

The OMS is based on an ML method that essentially recog-nizes a signature in the voltage profiles that indicates instability.All the training data considered here is from three-phase faults.To test the robustness of OMS, its performance on temporal pro-files obtained under single-line-to-ground (SLG) faults is alsotested. Since the signature for instability should be similar, goodperformance is expected even in this case. 140 SLG fault cases(for the 145-bus system) were generated out of which 125 arestable and 15 are unstable. CNNs trained on different subsets ofthe original training data were tested on these cases. Averagedover 10 trials the FA and MD rates are, respectively, 1.11% and0.67%. This amply demonstrates the effectiveness of the MLmethod.

The OMS is meant to be an online monitoring system. Henceanother important question is the time taken to compute theoutput of the CNN classifier. The time for calculating the finaloutput of OMS for a given window is 4.94 μs for 118-bus systemand 6.94 μs for the 145-bus system. Thus, the system can easilywork in real time. To train the CNN it took 2.67 minutes for

Fig. 8. Projected measurements in �2 showing outliers.

Fig. 9. Projected measurements in �2 for false alarms.

the 118-bus system and 21.84 minutes for the 145-bus system.However, this does not affect the practical deployment becausetraining is offline.

1) Identification of Critical Generator: Fig. 8 shows the 2-Dvisualization of the projected vectors corresponding to 118-busand 145-bus systems after the application of PCA as described inSection II-C1. The most critical generator(s) identified by PCAand Y-Net are shown by the solid circles and dotted rectanglesrespectively. In Fig. 8(a) and Fig. 8(c), there is only one outlierand hence both PCA and Y-Net identify the same generator. InFig. 8(b), there are two outliers. Although in visualization bothare visible, PCA gives numerically only one outlier, whereas,Y-Net gives both the critical generators. In Fig. 8(d), there arefive outliers. PCA gives only one critical generator, whereas, Y-Net gives all the critical generators. One advantage of the PCAmethod is that the visualization it creates can help reduce theeffective false alarm rate of the OMS. Fig. 9 shows two stablecases which are detected as unstable by OMS, i.e., they are falsealarms. It can be seen from Fig. 9, there are no outliers. Thus,the operator can easily disregard the false alarms generated byOMS using this visualization.

IV. CONCLUSION

In this paper a continuous online monitoring system, based ona convolutional neural network (CNN) for predicting instabilityin a power system is proposed. To the best of our knowledge,this is the first instance of using convolutional neural networksfor instability prediction. The CNN based OMS continuouslymonitors the voltage profiles through sliding windows (whosewidth is 5 samples) and generates a prediction on stability. OMSalso provides the set of most impacted generators. To aid theoperators in reducing the false alarms a PCA based visualizationis also proposed. The performance of the OMS is tested undertopology changes, generator parameter variations and variousnoise levels in measurements using the widely used IEEE 118


bus and IEEE 145 bus systems. Both in terms of false alarm ratesand missed detection rates as well as in terms of the accuracyin predicting all the CGs, the performance of the OMS is quitegood and is better than the state-of-art methods based on SVMs.It is also seen that the performance of OMS is robust to noiseand variations in the system.

While there are many classical criteria for stability they arebased on approximate models of the system and need informa-tion such as system topology. As we showed here, the proposedML-based technique is quite robust to topology changes in thesystem and it also detects instability much earlier. This maybe one of the main benefits of using ML techniques on thisproblem. However, since stability is a safety-critical issue, oneshould be careful about solely relying on an ML based method.In practice, one can use the ML system in conjunction withany classical stability criterion. Then the missed detection ratewould only decrease and we get the added benefits of the MLbased method.

An assumption of the system proposed here is that syn-chronous measurements of all generator bus voltages are avail-able at a central location. It is shown that the proposed systemis robust to some level of missing data. However, the processingof data is still centralized. Though such an assumption is com-mon to all methods that need to monitor the system instability,it would be nice to have a decentralised system. Since the deepneural network based methods are seen to be good at learningmany latent dependencies in data, one can ask whether suchneural networks can be used for distributed monitoring of thesystem. For example, one may want multiple OMS each withaccess to measurements of one or a few generators and each isrequired to predict if any of these generators are likely to loosesynchronism following a transient disturbance including suddendrop in renewables. This would be an interesting and importantdirection to extend the work presented here.

REFERENCES

[1] B. Stott, “Power system dynamic response calculations,” Proc. IEEE,vol. 67, no. 2, pp. 219–241, Feb. 1979.

[2] M. A. Pai, Energy Function Analysis for Power System Stability. Berlin,Germany: Springer, 2012.

[3] F. R. Gomez, A. D. Rajapakse, U. D. Annakkage, and I. T. Fernando,“Support vector machine-based algorithm for post-fault transient stabilitystatus prediction using synchronized measurements,” IEEE Trans. PowerSyst., vol. 26, no. 3, pp. 1474–1483, Aug. 2011.

[4] D. Ruiz-Vega and M. Pavella, “A comprehensive approach to transientstability control. I. Near optimal preventive control,” IEEE Trans. PowerSyst., vol. 18, no. 4, pp. 1446–1453, Nov. 2003.

[5] Y. Xue et al., “Extended equal area criterion revisited (EHV power sys-tems),” IEEE Trans. Power Syst., vol. 7, no. 3, pp. 1012–1022, Aug. 1992.

[6] M. Pavella, D. Ernst, and D. Ruiz-Vega, Transient Stability of PowerSystems: A Unified Approach to Assessment and Control. Berlin, Germany:Springer, 2012.

[7] A. Vaccaro and C. A. Canizares, “A knowledge-based framework forpower flow and optimal power flow analyses,” IEEE Trans. Smart Grid,vol. 9, no. 1, pp. 230–239, Jan. 2018.

[8] C. A. Jensen, M. A. El-Sharkawi, and R. J. Marks, “Power system se-curity assessment using neural networks: Feature selection using fisherdiscrimination,” IEEE Trans. Power Syst., vol. 16, no. 4, pp. 757–763,Nov. 2001.

[9] C. Zheng, V. Malbasa, and M. Kezunovic, “Regression tree for stabil-ity margin prediction using synchrophasor measurements,” IEEE Trans.Power Syst., vol. 28, no. 2, pp. 1978–1987, May 2013.

[10] I. Kamwa, S. Samantaray, and G. Joos, “Development of rule-based classi-fiers for rapid stability assessment of wide-area post-disturbance records,”IEEE Trans. Power Syst., vol. 24, no. 1, pp. 258–270, Feb. 2009.

[11] L. Wehenkel, M. Pavella, E. Euxibie, and B. Heilbronn, “Decision treebased transient stability method a case study,” IEEE Trans. Power Syst.,vol. 9, no. 1, pp. 459–469, Feb. 1994.

[12] L. Wehenkel, T. Van Cutsem, and M. Ribbens-Pavella, “An artificial in-telligence framework for online transient stability assessment of powersystems,” IEEE Trans. Power Syst., vol. 4, no. 2, pp. 789–800, May 1989.

[13] L. S. Moulin, A. A. Da Silva, M. El-Sharkawi, and R. J. Marks, “Sup-port vector machines for transient stability analysis of large-scale powersystems,” IEEE Trans. Power Syst., vol. 19, no. 2, pp. 818–825, May 2004.

[14] A. Gavoyiannis, D. Vogiatzis, D. Georgiadis, and N. Hatziargyriou, “Com-bined support vector classifiers using fuzzy clustering for dynamic secu-rity assessment,” in Proc. IEEE Power Eng. Soc. Summer Meeting, 2001,vol. 2, pp. 1281–1286.

[15] K. Chen, J. Hu, and J. He, “Detection and classification of transmissionline faults based on unsupervised feature learning and convolutional sparseautoencoder,” IEEE Trans. Smart Grid, vol. 9, no. 3, pp. 1748–1758, May2018.

[16] A. D. Rajapakse, F. Gomez, K. Nanayakkara, P. A. Crossley, and V. V.Terzija, “Rotor angle instability prediction using post-disturbance voltagetrajectories,” IEEE Trans. Power Syst., vol. 25, no. 2, pp. 947–956, May2010.

[17] L. Ji, J. Wu, Y. Zhou, and L. Hao, “Using trajectory clusters to define themost relevant features for transient stability prediction based on machinelearning method,” Energies, vol. 9, no. 11, pp. 898–917, 2016.

[18] Y. Zhou, J. Wu, Z. Yu, L. Ji, and L. Hao, “A hierarchical method fortransient stability prediction of power systems using the confidence of aSVM-based ensemble classifier,” Energies, vol. 9, no. 10, pp. 778–798,2016.

[19] A. Gupta, G. Gurrala, and P. S. Sastry, “Instability prediction in powersystems using recurrent neural networks,” in Proc. 26th Int. Joint Conf.Artif. Intell., AAAI Press, 2017, pp. 1795–1801.

[20] G. Gurrala, D. Dinesha, A. Dimitrovski, S. Simunovic, S. Pannala, andM. Starke, “Large multi-machine power system simulations using multi-stage adomian decomposition,” IEEE Trans. Power Syst., vol. 32, no. 5,pp. 3594–3606, Sep. 2017.

[21] P. Dehghanian, Y. Wang, G. Gurrala, E. Moreno-Centeno, and M.Kezunovic, “Flexible implementation of power system corrective topol-ogy control,” Elect. Power Syst. Res., vol. 128, pp. 79–89, 2015.

[22] Y. LeCun et al., “Convolutional networks for images, speech, andtime series,” Handbook Brain Theory Neural Netw., vol. 3361, no. 10,pp. 1995–2009, 1995.

[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in Proc. Adv. Neural Inf. Pro-cess. Syst., 2012, pp. 1097–1105.

[24] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller, “Efficient backprop,”in Neural Networks: Tricks of the Trade., Berlin, Germany: Springer, 2012,pp. 9–48.

[25] V. Nair and G. E. Hinton, “Rectified linear units improve restrictedBoltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn., 2010,pp. 807–814.

[26] C. Bishop, Pattern Recognition and Machine Learning (Information Sci-ence and Statistics Series). Berlin, Germany: Springer, 2006. [Online].Available: https://books.google.it/books?id=kTNoQgAACAAJ

[27] N. Qian, “On the momentum term in gradient descent learning algorithms,”Neural Netw., vol. 12, no. 1, pp. 145–151, 1999.

[28] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods foronline learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12,pp. 2121–2159, 2011.

[29] M. D. Zeiler, “Adadelta: An adaptive learning rate method,” unpublishedpaper, 2012. [Online]. Available: https://arxiv.org/abs/1212.5701

[30] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”in Proc. Int. Conf. Learn. Representations, 2014.

[31] V. Vittal et al., “Transient stability test systems for direct stability meth-ods,” IEEE Trans. Power Syst., vol. 7, no. 1, pp. 37–43, Feb. 1992.

[32] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracyestimation and model selection,” in Proc. Int. Joint Conf. Artif. Intell.,Stanford, CA, USA, 1995, vol. 14, no. 2, pp. 1137–1145.

[33] P. J. Rousseeuw and K. V. Driessen, “A fast algorithm for the minimumcovariance determinant estimator,” Technometrics, vol. 41, no. 3, pp. 212–223, 1999.

Authors’ photographs and biographies not available at the time of publication.

https://arxiv.org/abs/1212.5701

Documents

An Online Power System Stability Monitoring System Using ...static.tongtianta.site/paper_pdf/d49b74a2-47a0-11e9-b852-00163e08bb86.pdfgenerator parameter variations and measurement