AI 07_chapter 2

8/12/2019 AI 07_chapter 2

1/19

25

CHAPTER 2

ARTIFICIAL INTELLIGENCE TECHNIQUES

2.1 INTRODUCTION

Artificial intelligence techniques are widely used for almost all the

power system problems. Due to the non-linear nature of power systems its

operation and control involves complex computations. In the real time system

the number of buses will be more which further complicates the problem. The

degree of uncertainty associated with the power system components is also

high. Due to these reasons the AI techniques find major applications in

solving power system problems such as load forecasting, unit commitment

etc. The major advantages of AI techniques in comparison to conventionalmethods are, they require simple calculation and the computation time

required will be lesser comparatively. Moreover even with insufficient are

vague data the AI techniques can give reasonably accurate results. Especially

in the restructured power system where in the OASIS, spot market requires

the information about state of the system, voltage magnitudes and transactions

through various interfaces at regular time intervals. This Chapter discusses

about the various AI techniques that are applied for the estimation of ATC.The AI techniques used in this thesis are Support Vector Machine (SVM),

Fuzzy logic, Back Propagation Neural Network (BPNN) of Artificial Neural

Network (ANN) and Generalized Regression Neural Network (GRNN).

2.2 SUPPORT VECTOR MACHINE (SVM)

Many real world scenarios in pattern classification suffer from

missing or incomplete data irrespective of the field of application. For

8/12/2019 AI 07_chapter 2

2/19

26

example wireless sensor networks suffer from incomplete data sets due to

different reasons such as power outage at the sensor node, random

occurrences of local interferences or a higher bit error rate of the wirelessradio transmissions. In power systems estimating the unknown values from

the available data is one of the important problems. Load forecasting, state

estimation are the few examples of such problems.

Iffat Gheyas (2009) presented a detailed analysis on imputation

techniques used for data mining and knowledge discovery. SVM is

successfully used for data classification and regression. This thesis focuses onestimating the ATC value from the given data sets based on Weighted

K- Nearest Neighbors (WKNN) algorithm. This is one of the most popular

approaches for solving incomplete data problems. The process of finding

unknown or missing values is called as imputation. There are many

approaches varying from naive methods like mean imputation to some more

robust methods based on relationships among attributes. This section briefly

surveys some popular imputation methods and explains in detail about

WKNN imputation algorithm. The WKNN imputation algorithm replaces

missing values with weighted average of the K-nearest neighbors. SVM

estimates the unknown or missing value from the inputs by calculating the

Euclidean Distance (ED) of the new inputs from the given inputs and

mapping the unknown values.

2.2.1 Missing Data Imputation

Mean and mode imputation (Mimpute)

Mean and mode imputation consists of replacing the unknown

value for a given attribute by the mean (quantitative attribute) or mode

(qualitative attribute) of all known values of that attribute. Replacing all

missing records with a single value distorts the input data distribution.

8/12/2019 AI 07_chapter 2

3/19

27

Hot deck imputation (HDimpute)

Tshilidzi Marwala (2009) explained in detail about the

Computational Intelligence techniques used for missing data imputation.

Given an incomplete data pattern, HDimpute replaces the missing data with

the values from the input vector that is closest in terms of the attributes that

are known in both patterns.

KNN Imputation

The KNN algorithm is part of a family of learning methods known

as instance-based. Instance based learning methods are conceptually

straightforward approaches to approximate real-valued or discrete-valued

target functions. These methods are based on the principle that the instances

within a data set will generally exist in close proximity with other data sets

that have similar properties. Learning in these algorithms consists of simply

storing the presented training data set. When a new instance is encountered, a

set of similar training instances is retrieved from memory and used to make a

local approximation of the target function.

KNN algorithm imputes missing value by the average value of the

K-Nearest pattern as given by Equation (2.1)

K

kjk 1

ij

x

x

K

(2.1)

Where xij is the unknown value of the jth

variable in the ith

data set.

K is the number of nearest neighbors

xkj is the jthvariable of the k

thnearest neighbor data set.

8/12/2019 AI 07_chapter 2

4/19

28

Weighted K Nearest Neighbour (WKNN) Algorithm

WKNN algorithm finds the unknown value with a weighted

average of the K Nearest Neighbours as given by Equation (2.2)

Assume that the value of the jth

variable of the ith

data set, xij is

unknown. The unknown value can be calculated using the following formula:

K

k kj

k 1ij K

k

k 1

W x

x

W

(2.2)

where K is the number of nearest neighbors

k is the nearest neighbor

wkis the weight associated to the kth

nearest neighbor which is the

reciprocal of dik

dik is the Euclidean distance between the ith

data set and the kth

nearest neighbor data set.

xkj is the jthvariable of the k

thnearest neighbor data set.

The process of imputation can be understood by the following

example. Consider an example with five data sets and three features A, B and

C given in Table 2.1. Assume that for the fifth dataset the Bth

feature value is

unknown. The unknown value is assumed as X.

Table 2.1 The original data set with unknown value (X)

FeaturesData set

A B C

1 5 7 2

2 2 1 1

3 7 7 3

4 2 3 4

5 4 X 5

8/12/2019 AI 07_chapter 2

5/19

29

The data set by removing the column corresponding to feature B is

presented in Table 2.2.

Table 2.2 Data set by ignoring the column corresponding to the

unknown Value

FeaturesDataset

A C

1 5 2

2 2 1

3 7 3

4 2 4

5 4 5

lThe ED between the fifth data set and the first data set is calculated as

2 2

5 1ED {(4 5) (5 2) = 3.16

Similarly the EDs between the 5th

data set and data sets 2, 3 and 4

are calculated and presented in Table 2.3.

The weight for the data set 1 is computed as

w1 = 1/ 3.16

= 0.316

Similarly the weights for other data sets are calculated and

presented in Table 2.3.

8/12/2019 AI 07_chapter 2

6/19

30

Table 2.3 Data Sets with Euclidean Distance (ED)

Dataset A C

Euclidean Distance

to the fifth data set

Weights

Wk

1 5 2 3.16 0.31

2 2 1 4.47 0.22

3 7 3 3.6 0.27

4 2 4 2.23 0.44

From Table 2.3 it is observed that data sets 1, 3 and 4 are closer to

the data set 5 as their Euclidean distances are small. The number of nearestneighbors (K) is chosen as 3. The values of feature B corresponding to data

sets 1, 3 and 4 (presented in Table 2.1) are used in the following formula to

compute X. The value for the unknown X is computed as:

0.31 7 0.27 7 0.44 3X

0.31 0.27 0.44

= 5.27

In the above equation 0.31, 0.27 and 0.44 are the weights

corresponding to data sets 1, 3 and 4 respectively. 7, 7 and 3 are the values of

feature B corresponding to data sets 1, 3 and 4.

The SVM can compute more than one unknown value

simultaneously by choosing datasets based on ED. This thesis uses the

WKNN algorithm for ATC estimation because the WKNN algorithm uses

both the distance and weights to estimate the unknown value. As the weights

are the reciprocal of the Euclidean distance the closer data set will be given

more weightage and the farther neighbor will be given the lesser weightage.

Hence the WKNN algorithm will give better results compared to the other

imputation methods.

8/12/2019 AI 07_chapter 2

7/19

31

START

Read the data set with

input and output features.

Identify the data set with

unknown value.

Calculate the Euclidean Distance

(E.D) between the data set with

unknown value and other data sets

Select the feature with unknownvalue

Calculate weight w = 1/ E.D

Select K number of data sets

based on the E.D s calculated.

Estimate the unknown value using

the formula ,K

k k j

k 1i j K

k

k 1

W x

x

W

STOP

The flow chart to explain the process of imputation is given in

Figure 2.1

Figure 2.1 SVM WKNN Imputation Flow Chart

8/12/2019 AI 07_chapter 2

8/19

32

2.3 FUZZY LOGIC

Fuzzy logic has been applied to many power system problems

successfully. Khairuddin et al (2004) proposed a method for ATC estimation

using fuzzy logic. Tae Kyung Hahn et al (2008) described a fuzzy logic

approach to parallelizing contingency-constrained optimal power flow. The

fuzzy multi objective problem is formulated for ATC estimation. Sung

Sukim et al (2008) aimed to determine available transfer capability (ATC)

based on the fuzzy set theory for continuation power flow (CPF), thereby

capturing uncertainty.

This section presents the salient features of fuzzy logic and its

application to ATC estimation. The flow diagram of fuzzy logic is given in

Figure 2.2.

InputFuzzification

Rule Base

IF- AND-

THENOperator

Defuzzification

Centroid

method Output

Figure 2.2 Fuzzy Logic Flow Diagram

To develop a fuzzy model following are the steps to be followed.

Selection of input and output variables

Fuzzification

Developing rule base

De-fuzzification

8/12/2019 AI 07_chapter 2

9/19

33

Selection of input and output variables

The selection of inputs is very important stage for any AI model.

Inclusion of lesser significant variable in the input vector will increase the

size of the input vector unnecessarily rather omission of the most significant

variable may reduce the accuracy of the AI model.

Fuzzification

After identifying the input and output variables, the next step is

fuzzification. The number of linguistic variables for the input and output may

be appropriately chosen depending on the accuracy requirement. In this thesis

seven linguistic variables are used for input and output variables. Membership

functions characterize the fuzziness in a fuzzy set. There is infinite number of

ways to characterize fuzziness. As the membership function essentially

embodies all fuzziness for a particular fuzzy set, its description is the essence

of a fuzzy property or operation. Membership function may be symmetrical orasymmetrical. They are typically defined on one dimensional universe. In the

present study, a one dimensional triangular membership function is chosen for

each input and output linguistic variables. The membership function and the

range of values for each input and output linguistic variables are shown in

Figures 2.3 and 2.4 respectively.

Input = {L1, L2, L3, L4, L5, L6, L7}

Output = {L1, L2, L3, L4, L5, L6, L7}

The input and output variables are assumed to be ranging from 0 to

2. The width of each label is same and is assumed to be 0.5.

8/12/2019 AI 07_chapter 2

10/19

34

Figure 2.3 Triangular membership function for Input

Figure 2.4Triangular membership function for Output

Membership Value

The two inputs that are given to the fuzzy model are assumed to be

1.55 and 1.75 respectively.

For input 1, the membership values can be written as follows

Input 1 =1 2 3 4 5 6 7

0 0 0 0 0.8 0.2 0

L L L L L L L

The membership value of input 1(1.55) is calculated as follows:

The input 1 lies between L5 and L6 (refer Figure 2.3)

The membership values L5 and L6 can be calculated using the

following formula.

8/12/2019 AI 07_chapter 2

11/19

35

1 1

2 1 2 1

y y x x

y y x x

(2.3)

By substituting, x1 = 1.5, x2 = 1.75, y1 = 1 and y2 = 0 the

membership value of L5is calculated as 0.8.

Similarly L6 is calculated as 0.2.

The membership value of input 2 is written as follows,

Input 2 =1 2 3 4 5 6 7

0 0 0 0 0 1 0

L L L L L L L

L6 = 1.0 as the input 2 (1.75) is exactly lies in the peak of L6.

Rule base

In fuzzy logic based approach, decisions are made by forming aseries of rules that relate the input variables to the output variable using IF

AND-THEN statements. These decision rules are expressed using linguistic

variables. Fuzzy table is formed using all the fuzzy rules.

Defuzzification

The process of getting the crisp value from the fuzzy model is

called as defuzzification. The Centroid method of de-fuzzification is

commonly used to obtain the crisp value from the fuzzy table.

2.4 FEED FORWARD BACK PROPAGATION NEURAL

NETWORK

The feed forward back propagation neural network consists of two

layers. The first layer, or hidden layer, has a hyperbolic tangent sigmoid (tan-

8/12/2019 AI 07_chapter 2

12/19

36

sig) activation function as shown in Figure 2.5 and the second layer or output

layer has a linear activation function as shown in Figure 2.6 Thus the first

layer limits the output to a narrow range, from which the linear layer can

produce all values. The output of each layer can be represented by

Y NX1= f (W N* M XM,1+ bN,1) (2.4)

Where Y is a vector containing the output from each of the N neurons in a

given layer, Wis a matrix containing the weights for each of the M inputs for

all N neurons, Xis a vector containing the inputs, bis a vector containing the

biases and f(.) is the activation function.

Figure 2.5 Tan-sigmoid Transfer Function

Figure 2.6 Linear Transfer function

In the back propagation network, there are two steps during training

that are used alternately. The back propagation step calculates the error in the

8/12/2019 AI 07_chapter 2

13/19

37

gradient descent and propagates it backwards to each neuron in the output

layer, then hidden layer. In the second step, the weights and biases are then

recomputed, and the output from the activated neurons is then propagated

forward from the hidden layer to the output layer. The network is initialized

with random weights and biases, and then trained using the Levinson-

Marquardt algorithm. The weights and biases are updated according to

Dn+1= Dn- [JTJ+ I]

-1J

Te (2.5)

where Dn is a matrix containing the current weights and biases, Dn+1 is a

matrix containing the new weights and biases, e is the network error, and J is

a Jacobian matrix containing the 1 set derivative of e with respect to the

current weights and biases, I is the identity matrix and is the variable that

increases or decreases based on the performance function. The gradient of the

error surface, g is equal to JTe.

Each input is weighted with an appropriate W. The sum of the

weighted inputs and the bias forms the input to the transfer function f.

Neurons can use any differentiable transfer function f to generate their output.

A single-layer network of S logistic sigmoid (logsig) neurons

having R inputs which is the feed forward network often have one or more

hidden layers of sigmoid neurons followed by an output layer of linear

neurons. Multiple layers of neurons with non linear transfer function allow the

network to learn nonlinear and linear relationships between input and output

vectors. The linear output layer lets the network produce values outside the

range -1 to +1. On the other hand to constrain the outputs of a network (such

as between 0 and 1) the output layer should use a sigmoid transfer function

(such as logsig).

8/12/2019 AI 07_chapter 2

14/19

38

As noted in Neuron Model and Network Architectures, for

multiple-layer networks the number of layers determines the subscript on the

weight matrices. The appropriate notation is used in the two layer

tansig/purelin network shown in the Figure 2.7.

Figure 2.7 Structure of feed forward back propagation network

The transfer functions tansig and purelin can be expressed as

follows,

x x

x x

e eTansig(x)

e e

(2.6)

Purelin f(n) = n (2.7)

This network can be used as a general function approximator. It can

approximate any function with a finite number of discontinuities arbitrarily

well, given sufficient neurons in the hidden layer.

The flow diagram to explain the training procedure of BPA is given

in Figure 2.8

8/12/2019 AI 07_chapter 2

15/19

39

START

Initialize weights

Calculate the output value

Y NX1= f (W N* M XM,1+ bN,1)

Calculate error

NoError

8/12/2019 AI 07_chapter 2

16/19

40

Figure 2.9 Schematic diagram of GRNN

The first layer is connected to the second, pattern layer, where each

unit represents a training pattern and its output is a measure of the distance of

the input from the stored patterns. Each pattern layer unit is connected to the

two neurons in the summation layer: S-summation neuron and D-summation

neuron. The former computes the sum of the weighted outputs of the pattern

layer while the later calculates the outweighed outputs of the pattern neurons.

The connection weight between the ith

neuron in the pattern layer and the

S-summation neuron is yi; the target output value corresponding to the ith

input pattern. For the D-summation neuron, the connection weight is unity.

The output layer merely divides the output of each S-summation neuron by

that of each D-summation neuron, yielding the predicted value to an unknowninput vector x as

n

i ii 1

i n

ii 1

y exp[ D(x,x )]

y (x)

exp[ D(x,x )]

(2.8)

8/12/2019 AI 07_chapter 2

17/19

41

where n indicates the number of training patterns and the Gaussian D function

is defined as

2p

j ij

ii 1

x xD(x,x )

(2.9)

where, p indicates the number of elements of an input vector. The terms xj and

xij represent the jth

element of x and xi respectively. The term is generally

referred to as the spread factor. The GRNN method is used for estimation of

continuous variables, as in standard regression techniques. It is related to the

radial basis function network and is based on a standard statistical technique

called kernel regression. The joint probability density function (pdf) of x and

y is estimated during a training process in the GRNN. Because the pdf is

derived from the training data with no preconceptions about its form, the

system is perfectly general. The success of the GRNN method depends

heavily on the spread factors. The larger that spread is, the smoother the

function approximation. Too large a spread means a lot of neurons will be

required to fit a fast changing function. Too small a spread means many

neurons will be required to fit a smooth function, and the network may not

generalize well.

GRNN needs only a fraction of the training samples that a back

propagation neural network would need. GRNN is advantageous due to its

ability to converging to the underlying function of the data with only few

samples available. This makes the GRNN very useful and handy for the

problems with inadequate data.

The flow chart of GRNN training procedure is given in Figure 2.10

8/12/2019 AI 07_chapter 2

18/19

42

Read the Input Vector, Input

neuron

Start

Memorize the relationship between

input and response

Output is estimated using the transfer

function,

i ii 2

(X U )'(X U )e

2

Compute simple arithmetic summation,

s i

i

S

Compute weighted summation

w i i

i

S w

w

s

SOutputS

Stop

Figure 2.10 Flow Chart for GRNN training procedure

8/12/2019 AI 07_chapter 2

19/19

43

2.6 CONCLUSIONS

The AI methods discussed in this Chapter are used in this thesis for

ATC estimation. The effectiveness of SVM based model for estimating the

unknown value of more than on data set will be tested in the forthcoming

Chapters. The GRNN can give the accurate results even with lesser number of

training data sets. This feature will be tested by comparing the results of

GRNN and BPNN models. The fuzzy logic is one of the model based AI

techniques so it is chosen for ATC estimation. The respective MATLAB

toolboxes are used to develop the AI models for ATC estimation.

Documents

AI 07_chapter 2