AI 07_chapter 2

Embed Size (px)

Citation preview

  • 8/12/2019 AI 07_chapter 2

    1/19

    25

    CHAPTER 2

    ARTIFICIAL INTELLIGENCE TECHNIQUES

    2.1 INTRODUCTION

    Artificial intelligence techniques are widely used for almost all the

    power system problems. Due to the non-linear nature of power systems its

    operation and control involves complex computations. In the real time system

    the number of buses will be more which further complicates the problem. The

    degree of uncertainty associated with the power system components is also

    high. Due to these reasons the AI techniques find major applications in

    solving power system problems such as load forecasting, unit commitment

    etc. The major advantages of AI techniques in comparison to conventionalmethods are, they require simple calculation and the computation time

    required will be lesser comparatively. Moreover even with insufficient are

    vague data the AI techniques can give reasonably accurate results. Especially

    in the restructured power system where in the OASIS, spot market requires

    the information about state of the system, voltage magnitudes and transactions

    through various interfaces at regular time intervals. This Chapter discusses

    about the various AI techniques that are applied for the estimation of ATC.The AI techniques used in this thesis are Support Vector Machine (SVM),

    Fuzzy logic, Back Propagation Neural Network (BPNN) of Artificial Neural

    Network (ANN) and Generalized Regression Neural Network (GRNN).

    2.2 SUPPORT VECTOR MACHINE (SVM)

    Many real world scenarios in pattern classification suffer from

    missing or incomplete data irrespective of the field of application. For

  • 8/12/2019 AI 07_chapter 2

    2/19

    26

    example wireless sensor networks suffer from incomplete data sets due to

    different reasons such as power outage at the sensor node, random

    occurrences of local interferences or a higher bit error rate of the wirelessradio transmissions. In power systems estimating the unknown values from

    the available data is one of the important problems. Load forecasting, state

    estimation are the few examples of such problems.

    Iffat Gheyas (2009) presented a detailed analysis on imputation

    techniques used for data mining and knowledge discovery. SVM is

    successfully used for data classification and regression. This thesis focuses onestimating the ATC value from the given data sets based on Weighted

    K- Nearest Neighbors (WKNN) algorithm. This is one of the most popular

    approaches for solving incomplete data problems. The process of finding

    unknown or missing values is called as imputation. There are many

    approaches varying from naive methods like mean imputation to some more

    robust methods based on relationships among attributes. This section briefly

    surveys some popular imputation methods and explains in detail about

    WKNN imputation algorithm. The WKNN imputation algorithm replaces

    missing values with weighted average of the K-nearest neighbors. SVM

    estimates the unknown or missing value from the inputs by calculating the

    Euclidean Distance (ED) of the new inputs from the given inputs and

    mapping the unknown values.

    2.2.1 Missing Data Imputation

    Mean and mode imputation (Mimpute)

    Mean and mode imputation consists of replacing the unknown

    value for a given attribute by the mean (quantitative attribute) or mode

    (qualitative attribute) of all known values of that attribute. Replacing all

    missing records with a single value distorts the input data distribution.

  • 8/12/2019 AI 07_chapter 2

    3/19

    27

    Hot deck imputation (HDimpute)

    Tshilidzi Marwala (2009) explained in detail about the

    Computational Intelligence techniques used for missing data imputation.

    Given an incomplete data pattern, HDimpute replaces the missing data with

    the values from the input vector that is closest in terms of the attributes that

    are known in both patterns.

    KNN Imputation

    The KNN algorithm is part of a family of learning methods known

    as instance-based. Instance based learning methods are conceptually

    straightforward approaches to approximate real-valued or discrete-valued

    target functions. These methods are based on the principle that the instances

    within a data set will generally exist in close proximity with other data sets

    that have similar properties. Learning in these algorithms consists of simply

    storing the presented training data set. When a new instance is encountered, a

    set of similar training instances is retrieved from memory and used to make a

    local approximation of the target function.

    KNN algorithm imputes missing value by the average value of the

    K-Nearest pattern as given by Equation (2.1)

    K

    kjk 1

    ij

    x

    x

    K

    (2.1)

    Where xij is the unknown value of the jth

    variable in the ith

    data set.

    K is the number of nearest neighbors

    xkj is the jthvariable of the k

    thnearest neighbor data set.

  • 8/12/2019 AI 07_chapter 2

    4/19

    28

    Weighted K Nearest Neighbour (WKNN) Algorithm

    WKNN algorithm finds the unknown value with a weighted

    average of the K Nearest Neighbours as given by Equation (2.2)

    Assume that the value of the jth

    variable of the ith

    data set, xij is

    unknown. The unknown value can be calculated using the following formula:

    K

    k kj

    k 1ij K

    k

    k 1

    W x

    x

    W

    (2.2)

    where K is the number of nearest neighbors

    k is the nearest neighbor

    wkis the weight associated to the kth

    nearest neighbor which is the

    reciprocal of dik

    dik is the Euclidean distance between the ith

    data set and the kth

    nearest neighbor data set.

    xkj is the jthvariable of the k

    thnearest neighbor data set.

    The process of imputation can be understood by the following

    example. Consider an example with five data sets and three features A, B and

    C given in Table 2.1. Assume that for the fifth dataset the Bth

    feature value is

    unknown. The unknown value is assumed as X.

    Table 2.1 The original data set with unknown value (X)

    FeaturesData set

    A B C

    1 5 7 2

    2 2 1 1

    3 7 7 3

    4 2 3 4

    5 4 X 5

  • 8/12/2019 AI 07_chapter 2

    5/19

    29

    The data set by removing the column corresponding to feature B is

    presented in Table 2.2.

    Table 2.2 Data set by ignoring the column corresponding to the

    unknown Value

    FeaturesDataset

    A C

    1 5 2

    2 2 1

    3 7 3

    4 2 4

    5 4 5

    lThe ED between the fifth data set and the first data set is calculated as

    2 2

    5 1ED {(4 5) (5 2) = 3.16

    Similarly the EDs between the 5th

    data set and data sets 2, 3 and 4

    are calculated and presented in Table 2.3.

    The weight for the data set 1 is computed as

    w1 = 1/ 3.16

    = 0.316

    Similarly the weights for other data sets are calculated and

    presented in Table 2.3.

  • 8/12/2019 AI 07_chapter 2

    6/19

    30

    Table 2.3 Data Sets with Euclidean Distance (ED)

    Dataset A C

    Euclidean Distance

    to the fifth data set

    Weights

    Wk

    1 5 2 3.16 0.31

    2 2 1 4.47 0.22

    3 7 3 3.6 0.27

    4 2 4 2.23 0.44

    From Table 2.3 it is observed that data sets 1, 3 and 4 are closer to

    the data set 5 as their Euclidean distances are small. The number of nearestneighbors (K) is chosen as 3. The values of feature B corresponding to data

    sets 1, 3 and 4 (presented in Table 2.1) are used in the following formula to

    compute X. The value for the unknown X is computed as:

    0.31 7 0.27 7 0.44 3X

    0.31 0.27 0.44

    = 5.27

    In the above equation 0.31, 0.27 and 0.44 are the weights

    corresponding to data sets 1, 3 and 4 respectively. 7, 7 and 3 are the values of

    feature B corresponding to data sets 1, 3 and 4.

    The SVM can compute more than one unknown value

    simultaneously by choosing datasets based on ED. This thesis uses the

    WKNN algorithm for ATC estimation because the WKNN algorithm uses

    both the distance and weights to estimate the unknown value. As the weights

    are the reciprocal of the Euclidean distance the closer data set will be given

    more weightage and the farther neighbor will be given the lesser weightage.

    Hence the WKNN algorithm will give better results compared to the other

    imputation methods.

  • 8/12/2019 AI 07_chapter 2

    7/19

    31

    START

    Read the data set with

    input and output features.

    Identify the data set with

    unknown value.

    Calculate the Euclidean Distance

    (E.D) between the data set with

    unknown value and other data sets

    Select the feature with unknownvalue

    Calculate weight w = 1/ E.D

    Select K number of data sets

    based on the E.D s calculated.

    Estimate the unknown value using

    the formula ,K

    k k j

    k 1i j K

    k

    k 1

    W x

    x

    W

    STOP

    The flow chart to explain the process of imputation is given in

    Figure 2.1

    Figure 2.1 SVM WKNN Imputation Flow Chart

  • 8/12/2019 AI 07_chapter 2

    8/19

    32

    2.3 FUZZY LOGIC

    Fuzzy logic has been applied to many power system problems

    successfully. Khairuddin et al (2004) proposed a method for ATC estimation

    using fuzzy logic. Tae Kyung Hahn et al (2008) described a fuzzy logic

    approach to parallelizing contingency-constrained optimal power flow. The

    fuzzy multi objective problem is formulated for ATC estimation. Sung

    Sukim et al (2008) aimed to determine available transfer capability (ATC)

    based on the fuzzy set theory for continuation power flow (CPF), thereby

    capturing uncertainty.

    This section presents the salient features of fuzzy logic and its

    application to ATC estimation. The flow diagram of fuzzy logic is given in

    Figure 2.2.

    InputFuzzification

    Rule Base

    IF- AND-

    THENOperator

    Defuzzification

    Centroid

    method Output

    Figure 2.2 Fuzzy Logic Flow Diagram

    To develop a fuzzy model following are the steps to be followed.

    Selection of input and output variables

    Fuzzification

    Developing rule base

    De-fuzzification

  • 8/12/2019 AI 07_chapter 2

    9/19

    33

    Selection of input and output variables

    The selection of inputs is very important stage for any AI model.

    Inclusion of lesser significant variable in the input vector will increase the

    size of the input vector unnecessarily rather omission of the most significant

    variable may reduce the accuracy of the AI model.

    Fuzzification

    After identifying the input and output variables, the next step is

    fuzzification. The number of linguistic variables for the input and output may

    be appropriately chosen depending on the accuracy requirement. In this thesis

    seven linguistic variables are used for input and output variables. Membership

    functions characterize the fuzziness in a fuzzy set. There is infinite number of

    ways to characterize fuzziness. As the membership function essentially

    embodies all fuzziness for a particular fuzzy set, its description is the essence

    of a fuzzy property or operation. Membership function may be symmetrical orasymmetrical. They are typically defined on one dimensional universe. In the

    present study, a one dimensional triangular membership function is chosen for

    each input and output linguistic variables. The membership function and the

    range of values for each input and output linguistic variables are shown in

    Figures 2.3 and 2.4 respectively.

    Input = {L1, L2, L3, L4, L5, L6, L7}

    Output = {L1, L2, L3, L4, L5, L6, L7}

    The input and output variables are assumed to be ranging from 0 to

    2. The width of each label is same and is assumed to be 0.5.

  • 8/12/2019 AI 07_chapter 2

    10/19

    34

    Figure 2.3 Triangular membership function for Input

    Figure 2.4Triangular membership function for Output

    Membership Value

    The two inputs that are given to the fuzzy model are assumed to be

    1.55 and 1.75 respectively.

    For input 1, the membership values can be written as follows

    Input 1 =1 2 3 4 5 6 7

    0 0 0 0 0.8 0.2 0

    L L L L L L L

    The membership value of input 1(1.55) is calculated as follows:

    The input 1 lies between L5 and L6 (refer Figure 2.3)

    The membership values L5 and L6 can be calculated using the

    following formula.

  • 8/12/2019 AI 07_chapter 2

    11/19

    35

    1 1

    2 1 2 1

    y y x x

    y y x x

    (2.3)

    By substituting, x1 = 1.5, x2 = 1.75, y1 = 1 and y2 = 0 the

    membership value of L5is calculated as 0.8.

    Similarly L6 is calculated as 0.2.

    The membership value of input 2 is written as follows,

    Input 2 =1 2 3 4 5 6 7

    0 0 0 0 0 1 0

    L L L L L L L

    L6 = 1.0 as the input 2 (1.75) is exactly lies in the peak of L6.

    Rule base

    In fuzzy logic based approach, decisions are made by forming aseries of rules that relate the input variables to the output variable using IF

    AND-THEN statements. These decision rules are expressed using linguistic

    variables. Fuzzy table is formed using all the fuzzy rules.

    Defuzzification

    The process of getting the crisp value from the fuzzy model is

    called as defuzzification. The Centroid method of de-fuzzification is

    commonly used to obtain the crisp value from the fuzzy table.

    2.4 FEED FORWARD BACK PROPAGATION NEURAL

    NETWORK

    The feed forward back propagation neural network consists of two

    layers. The first layer, or hidden layer, has a hyperbolic tangent sigmoid (tan-

  • 8/12/2019 AI 07_chapter 2

    12/19

    36

    sig) activation function as shown in Figure 2.5 and the second layer or output

    layer has a linear activation function as shown in Figure 2.6 Thus the first

    layer limits the output to a narrow range, from which the linear layer can

    produce all values. The output of each layer can be represented by

    Y NX1= f (W N* M XM,1+ bN,1) (2.4)

    Where Y is a vector containing the output from each of the N neurons in a

    given layer, Wis a matrix containing the weights for each of the M inputs for

    all N neurons, Xis a vector containing the inputs, bis a vector containing the

    biases and f(.) is the activation function.

    Figure 2.5 Tan-sigmoid Transfer Function

    Figure 2.6 Linear Transfer function

    In the back propagation network, there are two steps during training

    that are used alternately. The back propagation step calculates the error in the

  • 8/12/2019 AI 07_chapter 2

    13/19

    37

    gradient descent and propagates it backwards to each neuron in the output

    layer, then hidden layer. In the second step, the weights and biases are then

    recomputed, and the output from the activated neurons is then propagated

    forward from the hidden layer to the output layer. The network is initialized

    with random weights and biases, and then trained using the Levinson-

    Marquardt algorithm. The weights and biases are updated according to

    Dn+1= Dn- [JTJ+ I]

    -1J

    Te (2.5)

    where Dn is a matrix containing the current weights and biases, Dn+1 is a

    matrix containing the new weights and biases, e is the network error, and J is

    a Jacobian matrix containing the 1 set derivative of e with respect to the

    current weights and biases, I is the identity matrix and is the variable that

    increases or decreases based on the performance function. The gradient of the

    error surface, g is equal to JTe.

    Each input is weighted with an appropriate W. The sum of the

    weighted inputs and the bias forms the input to the transfer function f.

    Neurons can use any differentiable transfer function f to generate their output.

    A single-layer network of S logistic sigmoid (logsig) neurons

    having R inputs which is the feed forward network often have one or more

    hidden layers of sigmoid neurons followed by an output layer of linear

    neurons. Multiple layers of neurons with non linear transfer function allow the

    network to learn nonlinear and linear relationships between input and output

    vectors. The linear output layer lets the network produce values outside the

    range -1 to +1. On the other hand to constrain the outputs of a network (such

    as between 0 and 1) the output layer should use a sigmoid transfer function

    (such as logsig).

  • 8/12/2019 AI 07_chapter 2

    14/19

    38

    As noted in Neuron Model and Network Architectures, for

    multiple-layer networks the number of layers determines the subscript on the

    weight matrices. The appropriate notation is used in the two layer

    tansig/purelin network shown in the Figure 2.7.

    Figure 2.7 Structure of feed forward back propagation network

    The transfer functions tansig and purelin can be expressed as

    follows,

    x x

    x x

    e eTansig(x)

    e e

    (2.6)

    Purelin f(n) = n (2.7)

    This network can be used as a general function approximator. It can

    approximate any function with a finite number of discontinuities arbitrarily

    well, given sufficient neurons in the hidden layer.

    The flow diagram to explain the training procedure of BPA is given

    in Figure 2.8

  • 8/12/2019 AI 07_chapter 2

    15/19

    39

    START

    Initialize weights

    Calculate the output value

    Y NX1= f (W N* M XM,1+ bN,1)

    Calculate error

    NoError

  • 8/12/2019 AI 07_chapter 2

    16/19

    40

    Figure 2.9 Schematic diagram of GRNN

    The first layer is connected to the second, pattern layer, where each

    unit represents a training pattern and its output is a measure of the distance of

    the input from the stored patterns. Each pattern layer unit is connected to the

    two neurons in the summation layer: S-summation neuron and D-summation

    neuron. The former computes the sum of the weighted outputs of the pattern

    layer while the later calculates the outweighed outputs of the pattern neurons.

    The connection weight between the ith

    neuron in the pattern layer and the

    S-summation neuron is yi; the target output value corresponding to the ith

    input pattern. For the D-summation neuron, the connection weight is unity.

    The output layer merely divides the output of each S-summation neuron by

    that of each D-summation neuron, yielding the predicted value to an unknowninput vector x as

    n

    i ii 1

    i n

    ii 1

    y exp[ D(x,x )]

    y (x)

    exp[ D(x,x )]

    (2.8)

  • 8/12/2019 AI 07_chapter 2

    17/19

    41

    where n indicates the number of training patterns and the Gaussian D function

    is defined as

    2p

    j ij

    ii 1

    x xD(x,x )

    (2.9)

    where, p indicates the number of elements of an input vector. The terms xj and

    xij represent the jth

    element of x and xi respectively. The term is generally

    referred to as the spread factor. The GRNN method is used for estimation of

    continuous variables, as in standard regression techniques. It is related to the

    radial basis function network and is based on a standard statistical technique

    called kernel regression. The joint probability density function (pdf) of x and

    y is estimated during a training process in the GRNN. Because the pdf is

    derived from the training data with no preconceptions about its form, the

    system is perfectly general. The success of the GRNN method depends

    heavily on the spread factors. The larger that spread is, the smoother the

    function approximation. Too large a spread means a lot of neurons will be

    required to fit a fast changing function. Too small a spread means many

    neurons will be required to fit a smooth function, and the network may not

    generalize well.

    GRNN needs only a fraction of the training samples that a back

    propagation neural network would need. GRNN is advantageous due to its

    ability to converging to the underlying function of the data with only few

    samples available. This makes the GRNN very useful and handy for the

    problems with inadequate data.

    The flow chart of GRNN training procedure is given in Figure 2.10

  • 8/12/2019 AI 07_chapter 2

    18/19

    42

    Read the Input Vector, Input

    neuron

    Start

    Memorize the relationship between

    input and response

    Output is estimated using the transfer

    function,

    i ii 2

    (X U )'(X U )e

    2

    Compute simple arithmetic summation,

    s i

    i

    S

    Compute weighted summation

    w i i

    i

    S w

    w

    s

    SOutputS

    Stop

    Figure 2.10 Flow Chart for GRNN training procedure

  • 8/12/2019 AI 07_chapter 2

    19/19

    43

    2.6 CONCLUSIONS

    The AI methods discussed in this Chapter are used in this thesis for

    ATC estimation. The effectiveness of SVM based model for estimating the

    unknown value of more than on data set will be tested in the forthcoming

    Chapters. The GRNN can give the accurate results even with lesser number of

    training data sets. This feature will be tested by comparing the results of

    GRNN and BPNN models. The fuzzy logic is one of the model based AI

    techniques so it is chosen for ATC estimation. The respective MATLAB

    toolboxes are used to develop the AI models for ATC estimation.