View
218
Download
0
Category
Preview:
Citation preview
8/12/2019 AI 07_chapter 2
1/19
25
CHAPTER 2
ARTIFICIAL INTELLIGENCE TECHNIQUES
2.1 INTRODUCTION
Artificial intelligence techniques are widely used for almost all the
power system problems. Due to the non-linear nature of power systems its
operation and control involves complex computations. In the real time system
the number of buses will be more which further complicates the problem. The
degree of uncertainty associated with the power system components is also
high. Due to these reasons the AI techniques find major applications in
solving power system problems such as load forecasting, unit commitment
etc. The major advantages of AI techniques in comparison to conventionalmethods are, they require simple calculation and the computation time
required will be lesser comparatively. Moreover even with insufficient are
vague data the AI techniques can give reasonably accurate results. Especially
in the restructured power system where in the OASIS, spot market requires
the information about state of the system, voltage magnitudes and transactions
through various interfaces at regular time intervals. This Chapter discusses
about the various AI techniques that are applied for the estimation of ATC.The AI techniques used in this thesis are Support Vector Machine (SVM),
Fuzzy logic, Back Propagation Neural Network (BPNN) of Artificial Neural
Network (ANN) and Generalized Regression Neural Network (GRNN).
2.2 SUPPORT VECTOR MACHINE (SVM)
Many real world scenarios in pattern classification suffer from
missing or incomplete data irrespective of the field of application. For
8/12/2019 AI 07_chapter 2
2/19
26
example wireless sensor networks suffer from incomplete data sets due to
different reasons such as power outage at the sensor node, random
occurrences of local interferences or a higher bit error rate of the wirelessradio transmissions. In power systems estimating the unknown values from
the available data is one of the important problems. Load forecasting, state
estimation are the few examples of such problems.
Iffat Gheyas (2009) presented a detailed analysis on imputation
techniques used for data mining and knowledge discovery. SVM is
successfully used for data classification and regression. This thesis focuses onestimating the ATC value from the given data sets based on Weighted
K- Nearest Neighbors (WKNN) algorithm. This is one of the most popular
approaches for solving incomplete data problems. The process of finding
unknown or missing values is called as imputation. There are many
approaches varying from naive methods like mean imputation to some more
robust methods based on relationships among attributes. This section briefly
surveys some popular imputation methods and explains in detail about
WKNN imputation algorithm. The WKNN imputation algorithm replaces
missing values with weighted average of the K-nearest neighbors. SVM
estimates the unknown or missing value from the inputs by calculating the
Euclidean Distance (ED) of the new inputs from the given inputs and
mapping the unknown values.
2.2.1 Missing Data Imputation
Mean and mode imputation (Mimpute)
Mean and mode imputation consists of replacing the unknown
value for a given attribute by the mean (quantitative attribute) or mode
(qualitative attribute) of all known values of that attribute. Replacing all
missing records with a single value distorts the input data distribution.
8/12/2019 AI 07_chapter 2
3/19
27
Hot deck imputation (HDimpute)
Tshilidzi Marwala (2009) explained in detail about the
Computational Intelligence techniques used for missing data imputation.
Given an incomplete data pattern, HDimpute replaces the missing data with
the values from the input vector that is closest in terms of the attributes that
are known in both patterns.
KNN Imputation
The KNN algorithm is part of a family of learning methods known
as instance-based. Instance based learning methods are conceptually
straightforward approaches to approximate real-valued or discrete-valued
target functions. These methods are based on the principle that the instances
within a data set will generally exist in close proximity with other data sets
that have similar properties. Learning in these algorithms consists of simply
storing the presented training data set. When a new instance is encountered, a
set of similar training instances is retrieved from memory and used to make a
local approximation of the target function.
KNN algorithm imputes missing value by the average value of the
K-Nearest pattern as given by Equation (2.1)
K
kjk 1
ij
x
x
K
(2.1)
Where xij is the unknown value of the jth
variable in the ith
data set.
K is the number of nearest neighbors
xkj is the jthvariable of the k
thnearest neighbor data set.
8/12/2019 AI 07_chapter 2
4/19
28
Weighted K Nearest Neighbour (WKNN) Algorithm
WKNN algorithm finds the unknown value with a weighted
average of the K Nearest Neighbours as given by Equation (2.2)
Assume that the value of the jth
variable of the ith
data set, xij is
unknown. The unknown value can be calculated using the following formula:
K
k kj
k 1ij K
k
k 1
W x
x
W
(2.2)
where K is the number of nearest neighbors
k is the nearest neighbor
wkis the weight associated to the kth
nearest neighbor which is the
reciprocal of dik
dik is the Euclidean distance between the ith
data set and the kth
nearest neighbor data set.
xkj is the jthvariable of the k
thnearest neighbor data set.
The process of imputation can be understood by the following
example. Consider an example with five data sets and three features A, B and
C given in Table 2.1. Assume that for the fifth dataset the Bth
feature value is
unknown. The unknown value is assumed as X.
Table 2.1 The original data set with unknown value (X)
FeaturesData set
A B C
1 5 7 2
2 2 1 1
3 7 7 3
4 2 3 4
5 4 X 5
8/12/2019 AI 07_chapter 2
5/19
29
The data set by removing the column corresponding to feature B is
presented in Table 2.2.
Table 2.2 Data set by ignoring the column corresponding to the
unknown Value
FeaturesDataset
A C
1 5 2
2 2 1
3 7 3
4 2 4
5 4 5
lThe ED between the fifth data set and the first data set is calculated as
2 2
5 1ED {(4 5) (5 2) = 3.16
Similarly the EDs between the 5th
data set and data sets 2, 3 and 4
are calculated and presented in Table 2.3.
The weight for the data set 1 is computed as
w1 = 1/ 3.16
= 0.316
Similarly the weights for other data sets are calculated and
presented in Table 2.3.
8/12/2019 AI 07_chapter 2
6/19
30
Table 2.3 Data Sets with Euclidean Distance (ED)
Dataset A C
Euclidean Distance
to the fifth data set
Weights
Wk
1 5 2 3.16 0.31
2 2 1 4.47 0.22
3 7 3 3.6 0.27
4 2 4 2.23 0.44
From Table 2.3 it is observed that data sets 1, 3 and 4 are closer to
the data set 5 as their Euclidean distances are small. The number of nearestneighbors (K) is chosen as 3. The values of feature B corresponding to data
sets 1, 3 and 4 (presented in Table 2.1) are used in the following formula to
compute X. The value for the unknown X is computed as:
0.31 7 0.27 7 0.44 3X
0.31 0.27 0.44
= 5.27
In the above equation 0.31, 0.27 and 0.44 are the weights
corresponding to data sets 1, 3 and 4 respectively. 7, 7 and 3 are the values of
feature B corresponding to data sets 1, 3 and 4.
The SVM can compute more than one unknown value
simultaneously by choosing datasets based on ED. This thesis uses the
WKNN algorithm for ATC estimation because the WKNN algorithm uses
both the distance and weights to estimate the unknown value. As the weights
are the reciprocal of the Euclidean distance the closer data set will be given
more weightage and the farther neighbor will be given the lesser weightage.
Hence the WKNN algorithm will give better results compared to the other
imputation methods.
8/12/2019 AI 07_chapter 2
7/19
31
START
Read the data set with
input and output features.
Identify the data set with
unknown value.
Calculate the Euclidean Distance
(E.D) between the data set with
unknown value and other data sets
Select the feature with unknownvalue
Calculate weight w = 1/ E.D
Select K number of data sets
based on the E.D s calculated.
Estimate the unknown value using
the formula ,K
k k j
k 1i j K
k
k 1
W x
x
W
STOP
The flow chart to explain the process of imputation is given in
Figure 2.1
Figure 2.1 SVM WKNN Imputation Flow Chart
8/12/2019 AI 07_chapter 2
8/19
32
2.3 FUZZY LOGIC
Fuzzy logic has been applied to many power system problems
successfully. Khairuddin et al (2004) proposed a method for ATC estimation
using fuzzy logic. Tae Kyung Hahn et al (2008) described a fuzzy logic
approach to parallelizing contingency-constrained optimal power flow. The
fuzzy multi objective problem is formulated for ATC estimation. Sung
Sukim et al (2008) aimed to determine available transfer capability (ATC)
based on the fuzzy set theory for continuation power flow (CPF), thereby
capturing uncertainty.
This section presents the salient features of fuzzy logic and its
application to ATC estimation. The flow diagram of fuzzy logic is given in
Figure 2.2.
InputFuzzification
Rule Base
IF- AND-
THENOperator
Defuzzification
Centroid
method Output
Figure 2.2 Fuzzy Logic Flow Diagram
To develop a fuzzy model following are the steps to be followed.
Selection of input and output variables
Fuzzification
Developing rule base
De-fuzzification
8/12/2019 AI 07_chapter 2
9/19
33
Selection of input and output variables
The selection of inputs is very important stage for any AI model.
Inclusion of lesser significant variable in the input vector will increase the
size of the input vector unnecessarily rather omission of the most significant
variable may reduce the accuracy of the AI model.
Fuzzification
After identifying the input and output variables, the next step is
fuzzification. The number of linguistic variables for the input and output may
be appropriately chosen depending on the accuracy requirement. In this thesis
seven linguistic variables are used for input and output variables. Membership
functions characterize the fuzziness in a fuzzy set. There is infinite number of
ways to characterize fuzziness. As the membership function essentially
embodies all fuzziness for a particular fuzzy set, its description is the essence
of a fuzzy property or operation. Membership function may be symmetrical orasymmetrical. They are typically defined on one dimensional universe. In the
present study, a one dimensional triangular membership function is chosen for
each input and output linguistic variables. The membership function and the
range of values for each input and output linguistic variables are shown in
Figures 2.3 and 2.4 respectively.
Input = {L1, L2, L3, L4, L5, L6, L7}
Output = {L1, L2, L3, L4, L5, L6, L7}
The input and output variables are assumed to be ranging from 0 to
2. The width of each label is same and is assumed to be 0.5.
8/12/2019 AI 07_chapter 2
10/19
34
Figure 2.3 Triangular membership function for Input
Figure 2.4Triangular membership function for Output
Membership Value
The two inputs that are given to the fuzzy model are assumed to be
1.55 and 1.75 respectively.
For input 1, the membership values can be written as follows
Input 1 =1 2 3 4 5 6 7
0 0 0 0 0.8 0.2 0
L L L L L L L
The membership value of input 1(1.55) is calculated as follows:
The input 1 lies between L5 and L6 (refer Figure 2.3)
The membership values L5 and L6 can be calculated using the
following formula.
8/12/2019 AI 07_chapter 2
11/19
35
1 1
2 1 2 1
y y x x
y y x x
(2.3)
By substituting, x1 = 1.5, x2 = 1.75, y1 = 1 and y2 = 0 the
membership value of L5is calculated as 0.8.
Similarly L6 is calculated as 0.2.
The membership value of input 2 is written as follows,
Input 2 =1 2 3 4 5 6 7
0 0 0 0 0 1 0
L L L L L L L
L6 = 1.0 as the input 2 (1.75) is exactly lies in the peak of L6.
Rule base
In fuzzy logic based approach, decisions are made by forming aseries of rules that relate the input variables to the output variable using IF
AND-THEN statements. These decision rules are expressed using linguistic
variables. Fuzzy table is formed using all the fuzzy rules.
Defuzzification
The process of getting the crisp value from the fuzzy model is
called as defuzzification. The Centroid method of de-fuzzification is
commonly used to obtain the crisp value from the fuzzy table.
2.4 FEED FORWARD BACK PROPAGATION NEURAL
NETWORK
The feed forward back propagation neural network consists of two
layers. The first layer, or hidden layer, has a hyperbolic tangent sigmoid (tan-
8/12/2019 AI 07_chapter 2
12/19
36
sig) activation function as shown in Figure 2.5 and the second layer or output
layer has a linear activation function as shown in Figure 2.6 Thus the first
layer limits the output to a narrow range, from which the linear layer can
produce all values. The output of each layer can be represented by
Y NX1= f (W N* M XM,1+ bN,1) (2.4)
Where Y is a vector containing the output from each of the N neurons in a
given layer, Wis a matrix containing the weights for each of the M inputs for
all N neurons, Xis a vector containing the inputs, bis a vector containing the
biases and f(.) is the activation function.
Figure 2.5 Tan-sigmoid Transfer Function
Figure 2.6 Linear Transfer function
In the back propagation network, there are two steps during training
that are used alternately. The back propagation step calculates the error in the
8/12/2019 AI 07_chapter 2
13/19
37
gradient descent and propagates it backwards to each neuron in the output
layer, then hidden layer. In the second step, the weights and biases are then
recomputed, and the output from the activated neurons is then propagated
forward from the hidden layer to the output layer. The network is initialized
with random weights and biases, and then trained using the Levinson-
Marquardt algorithm. The weights and biases are updated according to
Dn+1= Dn- [JTJ+ I]
-1J
Te (2.5)
where Dn is a matrix containing the current weights and biases, Dn+1 is a
matrix containing the new weights and biases, e is the network error, and J is
a Jacobian matrix containing the 1 set derivative of e with respect to the
current weights and biases, I is the identity matrix and is the variable that
increases or decreases based on the performance function. The gradient of the
error surface, g is equal to JTe.
Each input is weighted with an appropriate W. The sum of the
weighted inputs and the bias forms the input to the transfer function f.
Neurons can use any differentiable transfer function f to generate their output.
A single-layer network of S logistic sigmoid (logsig) neurons
having R inputs which is the feed forward network often have one or more
hidden layers of sigmoid neurons followed by an output layer of linear
neurons. Multiple layers of neurons with non linear transfer function allow the
network to learn nonlinear and linear relationships between input and output
vectors. The linear output layer lets the network produce values outside the
range -1 to +1. On the other hand to constrain the outputs of a network (such
as between 0 and 1) the output layer should use a sigmoid transfer function
(such as logsig).
8/12/2019 AI 07_chapter 2
14/19
38
As noted in Neuron Model and Network Architectures, for
multiple-layer networks the number of layers determines the subscript on the
weight matrices. The appropriate notation is used in the two layer
tansig/purelin network shown in the Figure 2.7.
Figure 2.7 Structure of feed forward back propagation network
The transfer functions tansig and purelin can be expressed as
follows,
x x
x x
e eTansig(x)
e e
(2.6)
Purelin f(n) = n (2.7)
This network can be used as a general function approximator. It can
approximate any function with a finite number of discontinuities arbitrarily
well, given sufficient neurons in the hidden layer.
The flow diagram to explain the training procedure of BPA is given
in Figure 2.8
8/12/2019 AI 07_chapter 2
15/19
39
START
Initialize weights
Calculate the output value
Y NX1= f (W N* M XM,1+ bN,1)
Calculate error
NoError
8/12/2019 AI 07_chapter 2
16/19
40
Figure 2.9 Schematic diagram of GRNN
The first layer is connected to the second, pattern layer, where each
unit represents a training pattern and its output is a measure of the distance of
the input from the stored patterns. Each pattern layer unit is connected to the
two neurons in the summation layer: S-summation neuron and D-summation
neuron. The former computes the sum of the weighted outputs of the pattern
layer while the later calculates the outweighed outputs of the pattern neurons.
The connection weight between the ith
neuron in the pattern layer and the
S-summation neuron is yi; the target output value corresponding to the ith
input pattern. For the D-summation neuron, the connection weight is unity.
The output layer merely divides the output of each S-summation neuron by
that of each D-summation neuron, yielding the predicted value to an unknowninput vector x as
n
i ii 1
i n
ii 1
y exp[ D(x,x )]
y (x)
exp[ D(x,x )]
(2.8)
8/12/2019 AI 07_chapter 2
17/19
41
where n indicates the number of training patterns and the Gaussian D function
is defined as
2p
j ij
ii 1
x xD(x,x )
(2.9)
where, p indicates the number of elements of an input vector. The terms xj and
xij represent the jth
element of x and xi respectively. The term is generally
referred to as the spread factor. The GRNN method is used for estimation of
continuous variables, as in standard regression techniques. It is related to the
radial basis function network and is based on a standard statistical technique
called kernel regression. The joint probability density function (pdf) of x and
y is estimated during a training process in the GRNN. Because the pdf is
derived from the training data with no preconceptions about its form, the
system is perfectly general. The success of the GRNN method depends
heavily on the spread factors. The larger that spread is, the smoother the
function approximation. Too large a spread means a lot of neurons will be
required to fit a fast changing function. Too small a spread means many
neurons will be required to fit a smooth function, and the network may not
generalize well.
GRNN needs only a fraction of the training samples that a back
propagation neural network would need. GRNN is advantageous due to its
ability to converging to the underlying function of the data with only few
samples available. This makes the GRNN very useful and handy for the
problems with inadequate data.
The flow chart of GRNN training procedure is given in Figure 2.10
8/12/2019 AI 07_chapter 2
18/19
42
Read the Input Vector, Input
neuron
Start
Memorize the relationship between
input and response
Output is estimated using the transfer
function,
i ii 2
(X U )'(X U )e
2
Compute simple arithmetic summation,
s i
i
S
Compute weighted summation
w i i
i
S w
w
s
SOutputS
Stop
Figure 2.10 Flow Chart for GRNN training procedure
8/12/2019 AI 07_chapter 2
19/19
43
2.6 CONCLUSIONS
The AI methods discussed in this Chapter are used in this thesis for
ATC estimation. The effectiveness of SVM based model for estimating the
unknown value of more than on data set will be tested in the forthcoming
Chapters. The GRNN can give the accurate results even with lesser number of
training data sets. This feature will be tested by comparing the results of
GRNN and BPNN models. The fuzzy logic is one of the model based AI
techniques so it is chosen for ATC estimation. The respective MATLAB
toolboxes are used to develop the AI models for ATC estimation.
Recommended