L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 1/35

Advance Topics in Mathematical Methods ME7100

Artificial Neural Network

8/19/2019 L_3(H1)



Introduction

An Artificial Neural Network (ANN) is an information processing

paradigm that is inspired by the way biological nervous systems,such as the brain, process information.

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

•

The neuron sends out signals through thin stand known as anaxon

• Neuron collects signals from others through structures called

dendrites .

• Synapse converts the activity from the axon into electrical effects

8/19/2019 L_3(H1)



Introduction

An Artificial Neural Network (ANN) is an information processing

paradigm that is inspired by the way biological nervous systems,such as the brain, process information.

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

Inputvalues

weights

Summing

function

Bias b

Activation

function

Output

y

x 1

x 2

x m

w 2

w m

w 1

) xw( f ii

8/19/2019 L_3(H1)



Introduction

Input

values

weights

Summing

function

Bias b

Activation

function

Output

y

x 1

x 2

x m

w 2

w m

w 1

) xw( f ii

Artificial Neural Network (ANN)

8/19/2019 L_3(H1)



● An Artificial Neural Network encompasses−

neuron model: Type of activation function,− an architecture: network structure

− number of neurons, number of layers, weight at each neuron

− a learning algorithm: Training of ANN by modifying theweights in order to mimic the known observations (input, output) such that the unknown

Introduction

8/19/2019 L_3(H1)



Activation Function

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 7/35Advance Topics in Mathematical Methods ME7100

Activation Function

sigmoid

rational function

hyperbolic tangent

8/19/2019 L_3(H1)


Network architectures

●Three different classes of network architectures

−single-layer feed-forward

−multi-layer feed-forward

−recurrent

single-layer feed-forward multi-layer feed-forward

http://codebase.mql4.com/5738

8/19/2019 L_3(H1)



●Three different classes of network architectures

−Recurrent: In a recurrent network, the weight matrix for each

layer contains input weights from all other neurons in the network, not

just neurons from the previous layer.

http://en.wikibooks.org/wiki/Artificial_Neural_Networks/Recurrent_Networks

8/19/2019 L_3(H1)


8/19/2019 L_3(H1)




−e.g. Perceptron -Rosenblatt (1958) classificationinto one of two categories.

●used for binary classification: Geometrically finding a hyper-plane that separates the examples in two classes


v=c

8/19/2019 L_3(H1)




Case : Can we predict heart disease on the basis of Age

sex (M/ F)

smoking frequency

Cholesterol

BPWeight Age

sex (m=1,

F=0) smoking frequency Cholesterol BP Weight

Heart Patient

( 0= nonpatient, 1=

patient)

55 0 3 143 109 66 0

41 0 1 145 91 43 0

45 1 1 224 126 46 1

60 0 8 237 83 85 1

22 0 3 140 83 56 0

53 1 4 163 94 73 1

34 0 5 188 88 53 1

41 1 5 192 120 46 1

39 1 6 222 126 75 1

52 1 8 179 99 72 1

58 0 7 165 122 58 1

58 1 6 182 117 47 1

37 1 3 174 113 46 0

49 0 2 190 126 45 1

8/19/2019 L_3(H1)




−e.g. Perceptron -Rosenblatt (1958) classificationinto one of two categories.

−A perceptron uses a step function


b =1.55

Y =0 or 1

f(v)

2501

2500

vif

vif )v( f

v =∑w i x i +b

X W

Age 0.880052

sex (m=1, F=0) -1.13407

smoking frequency 1.275656

Cholesterol 0.870191

BP 0.124578

Weight 0.759339

8/19/2019 L_3(H1)


Network TrainingMethod of Gradient descent:An algorithm for finding the nearest

local minimum/maximum of a function which presupposes that the

gradient of the function can be computed.

x=xmax

x=x1 x=x1

x=xmax

x=x1 x=x1

http://mathworld.wolfram.com/Algorithm.html

http://mathworld.wolfram.com/LocalMinimum.html

http://mathworld.wolfram.com/Gradient.html




8/19/2019 L_3(H1)





x=xmax

x=x1

x=x2

Choose arbitrary point x=x1

For maximization (f(x 2 )- f(x 1))>0

If f’(x) <0 , x 2 < x 1

f’(x) >0 , x 2 > x 1

Thus, the following will always

yield movement towards maxima

x 2 =x 1+ η f ’(x ) η>0

η is learning rate x=x1

x=x2







8/19/2019 L_3(H1)





x=xmax

x=x1 x=x1

Choose arbitrary point x=x1

For minimization (f(x 2 )- f(x 1))<0

If f’(x) <0 , x 2 > x 1

f’(x) >0 , x 2 < x 1

Thus, the following will always

yield movement towards minima

x 2 =x 1- η f ’(x ) η>0

η is learning rate

x=x2 x=x2







8/19/2019 L_3(H1)


17Multi variable - single response

Risk of trapping in local

maxima or minima!!

Heuristic Method

http://bayen.eecs.berkeley.edu/bayen/?q=webfm_send/246

8/19/2019 L_3(H1)


Network Training

Method of Gradient descent for ANN: The network is to be optimized

through adjustment of weights such that error in prediction is

minimized

8/19/2019 L_3(H1)


Network Training

Liner Perceptron

x 1

x 2

x n

w 2

w 1

w n

b (bias)

y f(v)

v =∑w i x i +b

()

∆ ∆ (′())

∆

Repeat till solution converges

8/19/2019 L_3(H1)


Network Training

Liner Perceptron s number of training samples

x 1

x 2

x n

w 2

w 1

w n

b (bias)

y f(v)

v =∑w i x i +b

∆


=

()

∆

=

() [′()

∆ =

[ ]

8/19/2019 L_3(H1)


Network TrainingLinear single layer perceptron ( m inputs & n

outputs)

=

w ij x i +bj

=

( )

∆

∆ =

( ) [′( )

∆

=

( )


8/19/2019 L_3(H1)



outputs, s number of training samples)

=

=

( )

∆

∆ =

=

( ) [′( )

∆ =

=

[ ]

=

w ij x i +bj

8/19/2019 L_3(H1)



Network Training

Non-linear Perceptron

x 1

x 2

x n

w 2

w 1

w n

b (bias)

y f(v)

v =∑w i x i +b

+ −

()

∆

∆ [ ]

∆ [ ][ ]

8/19/2019 L_3(H1)




outputs)

=

w ij x i +bj

=

( )

∆

∆ =

( ) [′( )

∆

=

( )


8/19/2019 L_3(H1)



Network TrainingLinear multi-layer perceptron ( 2 inputs, 2

hidden neuron and one output)

( + + ) f

X1

X2

f1

f2

y

w 1w 12

b b1

∆

w 11

w 21

w 22

b2

w 2

∆ ( + + ) ∆ ( + + ) ( + + 1)

∆ ( + + ) ( + + 2)

Output weight scheme

similarly

f ( + + 1)

f ( + + 1)

8/19/2019 L_3(H1)





( + + ) f

f1

f2

y

w 1w 12

b b1

∆

w 11

w 21

w 22

b2

w 2

∆ ( + + )

f ( + + 1)

Input weight scheme

∆ ( + + )

∆ ( + + )

∆ ( + + )

similarly

f ( + + 1)

8/19/2019 L_3(H1)





( + + ) f

f1

f2

y

w 1w 12

b b1

∆

w 11

w 21

w 22

b2

w 2

f ( + + 1)

Input weight scheme

∆ ( + + )

∆ ( + + )

similarly ∆ ( + + )

f (

+ + 1)

Back Propagation

8/19/2019 L_3(H1)



Network TrainingLinear multi-layer perceptron ( m inputs h


(+ )

f f j

m inputs h hidden neurons 1 Output

(ith

input jth

neuron )

y w j

w ij

(+ =

(+)

b b j

∆ (+ = (+ ) = (+ )

∆

∆ (+ = (+ )

8/19/2019 L_3(H1)



Network TrainingLinear multi-layer perceptron ( m inputs & n

outputs)

( + )

+

=

( + )

f k f j

m inputs h hidden neurons n Output

(ith

input jth

neurons kth

output)

y k

w jk

w ij

8/19/2019 L_3(H1)



Network Training

1. Sigmoidal multi Perceptron s number oftraining samples

2. Sigmoidal single layer perceptron ( m inputs &

n outputs)

3. Sigmoidal single layer perceptron ( m inputs &

n outputs, s number of training samples)

Derive weight change scheme for

8/19/2019 L_3(H1)



Optimal ANN Architecture

• Map the known data

• Generalize the new data

r

Number of Iterations

8/19/2019 L_3(H1)




• Generalization Techniques

– Splitting Technique

100 20

Training

Validation

Testing

8/19/2019 L_3(H1)



Predictive Process Models : ANN

Input

Input

Layer

Hidden

Layer

Output

Layer

Output

F ([W][X]+[B]))

X

1

X

2

X

3

W

1

W

2

W

3

B

8/19/2019 L_3(H1)




• Generalization Techniques

– Cross Validation

Divide data set in (n) groups of size k each

Train (n-1) Data sets and check the error in remaining n

th

set

Repeat process with different initial weights and average

the results (ensemble method)

Calculate error for all the n sets taken as testing sets

Calculate Error of cross validation

2n

1i

n/k

1 jCalculatedActual

cv )y()y( ijij N

1 E

8/19/2019 L_3(H1)


Documents

L_3(H1)