35
Advance T opics in Mathematical Methods ME7 100  Artificial Neura l Network

L_3(H1)

Embed Size (px)

Citation preview

Page 1: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 1/35

Advance Topics in Mathematical Methods ME7100

 Artificial Neural Network

Page 2: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 2/35

Advance Topics in Mathematical Methods ME7100

Introduction

An Artificial Neural Network (ANN) is an information processing

paradigm that is inspired by the way biological nervous systems,such as the brain, process information.

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

 The neuron sends out signals through thin stand known as anaxon 

• Neuron collects signals from others through structures called

dendrites .

• Synapse converts the activity from the axon into electrical effects

Page 3: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 3/35

Advance Topics in Mathematical Methods ME7100

Introduction

An Artificial Neural Network (ANN) is an information processing

paradigm that is inspired by the way biological nervous systems,such as the brain, process information.

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

Inputvalues

weights

Summing

function

Bias b 

Activation

function

Output

x 1 

x 2 

x m 

w 2 

w m 

w 1 

 

  ) xw(  f  ii

Page 4: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 4/35

Advance Topics in Mathematical Methods ME7100

Introduction

Input

values

weights

Summing

function

Bias b 

Activation

function

Output

x 1 

x 2 

x m 

w 2 

w m 

w 1 

 

  ) xw(  f  ii

Artificial Neural Network (ANN)

Page 5: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 5/35

Advance Topics in Mathematical Methods ME7100

●  An Artificial Neural Network encompasses−

  neuron model: Type of activation function,−   an architecture: network structure

−   number of neurons, number of layers, weight at each neuron

−   a learning algorithm:  Training of ANN by modifying theweights in order to mimic the known observations (input, output) such that the unknown

Introduction

Page 6: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 6/35

Advance Topics in Mathematical Methods ME7100

 Activation Function

Page 7: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 7/35Advance Topics in Mathematical Methods ME7100

 Activation Function

sigmoid

rational function

hyperbolic tangent

Page 8: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 8/35Advance Topics in Mathematical Methods ME7100

Network architectures

●Three different classes of network architectures

−single-layer feed-forward

−multi-layer feed-forward

−recurrent

single-layer feed-forward multi-layer feed-forward

http://codebase.mql4.com/5738

Page 9: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 9/35Advance Topics in Mathematical Methods ME7100

Network architectures

●Three different classes of network architectures

−Recurrent: In a recurrent network, the weight matrix for each

layer contains input weights from all other neurons in the network, not

 just neurons from the previous layer.

http://en.wikibooks.org/wiki/Artificial_Neural_Networks/Recurrent_Networks

Page 10: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 10/35

Page 11: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 11/35Advance Topics in Mathematical Methods ME7100

Network architectures

−single-layer feed-forward

−e.g. Perceptron -Rosenblatt (1958) classificationinto one of two categories.

●used for binary classification: Geometrically finding a hyper-plane that separates the examples in two classes

http://codebase.mql4.com/5738

v=c 

Page 12: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 12/35Advance Topics in Mathematical Methods ME7100

Network architectures

−single-layer feed-forward

Case : Can we predict heart disease on the basis of Age

sex (M/ F)

smoking frequency

Cholesterol

BPWeight Age

sex (m=1,

F=0) smoking frequency  Cholesterol BP Weight

Heart Patient

( 0= nonpatient, 1=

patient)

55 0 3 143 109 66 0

41 0 1 145 91 43 0

45 1 1 224 126 46 1

60 0 8 237 83 85 1

22 0 3 140 83 56 0

53 1 4 163 94 73 1

34 0 5 188 88 53 1

41 1 5 192 120 46 1

39 1 6 222 126 75 1

52 1 8 179 99 72 1

58 0 7 165 122 58 1

58 1 6 182 117 47 1

37 1 3 174 113 46 0

49 0 2 190 126 45 1

Page 13: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 13/35Advance Topics in Mathematical Methods ME7100

Network architectures

−single-layer feed-forward

−e.g. Perceptron -Rosenblatt (1958) classificationinto one of two categories.

−A perceptron uses a step function

http://codebase.mql4.com/5738

b =1.55

Y =0 or 1 

f(v) 

2501

2500

vif 

vif   )v(  f 

v =∑w i x i +b 

X W

Age 0.880052

sex (m=1, F=0) -1.13407

smoking frequency 1.275656

Cholesterol   0.870191

BP 0.124578

Weight 0.759339  

Page 14: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 14/35Advance Topics in Mathematical Methods ME7100

Network TrainingMethod of Gradient descent:An algorithm for finding the nearest

local minimum/maximum of a function which presupposes that the

gradient of the function can be computed.

 x=xmax

 x=x1 x=x1

 x=xmax

 x=x1 x=x1

Page 15: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 15/35Advance Topics in Mathematical Methods ME7100

Network TrainingMethod of Gradient descent:An algorithm for finding the nearest

local minimum/maximum of a function which presupposes that the

gradient of the function can be computed.

 x=xmax

 x=x1

 x=x2

Choose arbitrary point x=x1

For maximization (f(x 2 )- f(x 1))>0 

If  f’(x) <0 , x 2 < x 1

 f’(x) >0 , x 2 > x 1

Thus, the following will always

yield movement towards maxima 

x 2 =x 1+ η f ’(x ) η>0 

η is learning rate  x=x1

 x=x2

Page 16: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 16/35Advance Topics in Mathematical Methods ME7100

Network TrainingMethod of Gradient descent:An algorithm for finding the nearest

local minimum/maximum of a function which presupposes that the

gradient of the function can be computed.

 x=xmax

 x=x1 x=x1

Choose arbitrary point x=x1

For minimization (f(x 2 )- f(x 1))<0 

If  f’(x) <0 , x 2 > x 1

 f’(x) >0 , x 2 < x 1

Thus, the following will always

yield movement towards minima 

x 2 =x 1- η f ’(x ) η>0 

η is learning rate 

 x=x2  x=x2

Page 17: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 17/35Advance Topics in Mathematical Methods ME7100

17Multi variable - single response

Risk of trapping in local

maxima or minima!! 

Heuristic Method

http://bayen.eecs.berkeley.edu/bayen/?q=webfm_send/246

Page 18: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 18/35Advance Topics in Mathematical Methods ME7100

Network Training

Method of Gradient descent for ANN: The network is to be optimized

through adjustment of weights such that error in prediction is

minimized

Page 19: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 19/35Advance Topics in Mathematical Methods ME7100

Network Training

Liner Perceptron

x 1 

x 2 

x n 

w 2 

w 1 

w n 

b (bias)

y f(v) 

v =∑w i x i +b 

 

 

  () 

∆   ∆  (′())  

∆ 

Repeat till solution converges

Page 20: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 20/35Advance Topics in Mathematical Methods ME7100

Network Training

Liner Perceptron s number of training samples 

x 1 

x 2 

x n 

w 2 

w 1 

w n 

b (bias)

y f(v) 

v =∑w i x i +b 

 

∆   

Repeat till solution converges

 

=

() 

∆ 

=

() [′() 

∆  =

[  ]

Page 21: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 21/35Advance Topics in Mathematical Methods ME7100

Network TrainingLinear single layer perceptron ( m inputs & n

outputs) 

           =

w ij x i +bj 

 

 =

  ( ) 

∆   

∆   =

  ( ) [′( ) 

 

∆ 

 =

  ( )

Repeat till solution converges

Page 22: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 22/35Advance Topics in Mathematical Methods ME7100

Network TrainingLinear single layer perceptron ( m inputs & n

outputs, s number of training samples) 

 

 =

=

  ( ) 

∆   

∆   =

=

  ( ) [′( )   

∆   =

=

[    ]

           =

w ij x i +bj 

Page 23: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 23/35

Advance Topics in Mathematical Methods ME7100

Network Training

Non-linear Perceptron

x 1 

x 2 

x n 

w 2 

w 1 

w n 

b (bias)

y f(v) 

v =∑w i x i +b 

  + −

   ()

 

∆   

∆  [  ]  

∆  [ ][ ]

Page 24: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 24/35

Advance Topics in Mathematical Methods ME7100

Network TrainingLinear single layer perceptron ( m inputs & n

outputs) 

           =

w ij x i +bj 

 

 =

  ( ) 

∆   

∆   =

  ( ) [′( ) 

 

∆ 

 =

  ( )

Repeat till solution converges

Page 25: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 25/35

Advance Topics in Mathematical Methods ME7100

Network TrainingLinear multi-layer perceptron ( 2 inputs, 2

hidden neuron and one output) 

      

  (  +   + ) f  

X1

X2

f1

f2 

w 1w 12 

b b1

∆   

w 11

w 21

w 22 

b2 

w 2 

∆  (  +   + ) ∆  (  +   + ) ( + + 1)

∆  (  +   + ) ( + + 2)

Output weight scheme

similarly

f   ( + + 1)

f   ( + + 1)

Page 26: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 26/35

Advance Topics in Mathematical Methods ME7100

Network TrainingLinear multi-layer perceptron ( 2 inputs, 2

hidden neuron and one output) 

      

  (  +   + ) f  

f1

f2 

w 1w 12 

b b1

∆   

w 11

w 21

w 22 

b2 

w 2 

∆  (  +   + )

f   ( + + 1)

Input weight scheme

∆  (  +   + )

∆  (  +   + )

∆  (  +   + )

similarly

f   ( + + 1)

Page 27: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 27/35

Advance Topics in Mathematical Methods ME7100

Network TrainingLinear multi-layer perceptron ( 2 inputs, 2

hidden neuron and one output) 

      

  (  +   + ) f  

f1

f2 

w 1w 12 

b b1

∆   

w 11

w 21

w 22 

b2 

w 2 

f   ( + + 1)

Input weight scheme

∆  (  +   + )

∆  (  +   + )

similarly ∆  (  +   + )

f  (

+ + 1)

Back Propagation 

Page 28: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 28/35

Advance Topics in Mathematical Methods ME7100

Network TrainingLinear multi-layer perceptron ( m inputs h

hidden neuron and one output) 

    

  

 (+   )

f  f   j 

m inputs h hidden neurons 1 Output

(ith

input jth

neuron )

y w  j 

w ij 

     (+  =

(+)

b b  j 

∆     (+  = (+ )   = (+ )

∆     

∆    (+  = (+ )  

Page 29: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 29/35

Advance Topics in Mathematical Methods ME7100

Network TrainingLinear multi-layer perceptron ( m inputs & n

outputs) 

 

   

 

 

( +    )

 

 

 +  

=

( +  )

f  k f   j 

m inputs h hidden neurons n Output

(ith

input jth

neurons kth

output)

y k 

w  jk 

w ij 

Page 30: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 30/35

Advance Topics in Mathematical Methods ME7100

Network Training

1. Sigmoidal multi Perceptron s number oftraining samples 

2. Sigmoidal single layer perceptron ( m inputs &

n outputs) 

3. Sigmoidal single layer perceptron ( m inputs &

n outputs, s number of training samples) 

Derive weight change scheme for

Page 31: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 31/35

Advance Topics in Mathematical Methods ME7100

Optimal ANN Architecture

• Map the known data

• Generalize the new data

 

r

Number of Iterations

Page 32: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 32/35

Advance Topics in Mathematical Methods ME7100

Optimal ANN Architecture

• Generalization Techniques

 –  Splitting Technique

100 20

Training

Validation

Testing

Page 33: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 33/35

Advance Topics in Mathematical Methods ME7100

Predictive Process Models : ANN

Input

Input

Layer

Hidden

Layer

Output

Layer

Output

F ([W][X]+[B]))

X

1

X

2

X

3

W

1

W

2

W

3

B

Page 34: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 34/35

Advance Topics in Mathematical Methods ME7100

Optimal ANN Architecture

• Generalization Techniques

 –  Cross Validation

Divide data set in (n) groups of size k each

Train (n-1) Data sets and check the error in remaining n

th

set

Repeat process with different initial weights and average

the results (ensemble method)

Calculate error for all the n sets taken as testing sets

Calculate Error of cross validation

2n

1i

n/k 

1 jCalculatedActual

cv )y()y( ijij N

1 E  

 

Page 35: L_3(H1)

8/19/2019 L_3(H1)

http://slidepdf.com/reader/full/l3h1 35/35