55
Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici Vishwani D. Agrawal Vitaly Vodyanoy University Reader: Weikuan Yu

Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Embed Size (px)

Citation preview

Page 1: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Research on Advanced Training Algorithms of Neural Networks

Hao YuPh.D DefenseAug 17th 2011

Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici Vishwani D. Agrawal Vitaly Vodyanoy University Reader: Weikuan Yu

Page 2: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 3: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

What is Neural Network

• Classification: separate the two groups (red circles and blue stars) of twisted points [1].

Page 4: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

What is Neural Network

• Interpolation: with the given 25 points (red), find the values of points A and B (black)

Page 5: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

What is Neural Network

• Human Solutions

• Neural Network Solutions

Page 6: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

What is Neural Network

• Recognition: retrieve the noised digit images (left) to original images (right)

Original ImagesNoised Images

Page 7: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

What is Neural Network

• “Learn to Behave”

• Build any relationship between input and outputs [2]

Learning Process “Behave”

Page 8: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Why Neural Network

• What makes neural network different

Given Patterns (5×5=25) Testing Patterns (41×41=1,681)

Page 9: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Different Approximators

• Test Results of Different Approximators

Mamdani fuzzy TSK fuzzy Neuro-fuzzy SVM-RBF SVM-Poly

Nearest Linear Spline Cubic Neural Network

Matlab Function: Interp2

Page 10: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Comparison

• Neural networks behave potentially as the best approximator

Methods of Computational Intelligence Sum Square ErrorsFuzzy inference system – Mamdani 319.7334Fuzzy inference system – TSK 35.1627Neuron – fuzzy system 27.3356Support vector machine – RBF kernel 28.9595Support vector machine – polynomial kernel 176.1520Interpolation – nearest 197.7494Interpolation – linear 28.6683Interpolation – spline 11.0874Interpolation – cubic 3.2791Neural network – 4 neurons in FCC network 2.3628Neural network – 5 neurons in FCC network 0.4648

Page 11: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 12: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

A Single Neuron

• Two basic computations

)(xfy

x1

x3

x2

x6

x5

x4

+1

net

x7

0

7

1

wwxneti

ii

netfy

xgainxfy

1gain

xgainxfy tan

(1)

(2)

Page 13: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Network Architectures

• Multiplayer perceptron network is the most popular architecture

• Networks with connections across layers, such as bridged multiplayer perceptron (BMLP) networks and fully connected cascade (FCC) networks are much powerful than MLP networks.

• Wilamowski, B. M.   Hunter, D.   Malinowski, A., "Solving parity-N problems with feedforward neural networks". Proc. 2003 IEEE IJCNN, 2546-2551, IEEE Press, 2003.

• M. E. Hohil, D. Liu, and S. H. Smith, "Solving the N-bit parity problem using neural networks," Neural Networks, vol. 12, pp1321-1323, 1999.

• Example: smallest networks for solving parity-7 problem (analytical results)

MLP network

FCC networkBMLP network

Page 14: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 15: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Error Back Propagation Algorithm

• The most popular algorithm for neural network training

• Update rule of EBP algorithm [3]

• Developed based on gradient optimization

• Advantages: – Easy

– Stable

• Disadvantages:– Very limited power

– Slow convergence

kk gw

321

,,w

E

w

E

w

Eg 321 ,, wwww

Page 16: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Improvement of EBP

• Improved gradient using momentum [4]

• Adjusted learning constant [5-6]

11 kkk wgw 10

1 kw

kw

kg1 kw

kg 1

1 kw

kw

B

A

Page 17: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Newton Algorithm

• Newton algorithm: using the derivative of gradient to evaluate the change of gradient, then select proper learning constants in each direction [7]

• Advantages:– Fast convergence

• Disadvantages:– Not stable– Requires computation of second order derivative

kkk gHw 1

ii w

Eg

P

p

M

mpmeE

1 1

2

2

1

2

2

2

2

1

2

2

2

22

2

12

21

2

21

2

21

2

NNN

N

N

w

E

ww

E

ww

E

ww

E

w

E

ww

E

ww

E

ww

E

w

E

H

Page 18: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Gaussian-Newton Algorithm

• Gaussian-Newton algorithm: eliminate the second order derivatives in Newton Method, by introducing Jacobian matrix

• Advantages:– Fast convergence

• Disadvantages:– Not stable

kTkk

Tkk eJJJw

1

JJH T

eJg T

N

MPMPMP

N

PPP

N

PPP

N

MMM

N

N

w

e

w

e

w

e

w

e

w

e

w

ew

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

ew

e

w

e

w

e

,

2

,

1

,

2,

2

2,

1

2,

1,

2

1,

1

1,

,1

2

,1

1

,1

2,1

2

2,1

1

2,1

1,1

2

1,1

1

1,1

J

MP

P

P

M

e

e

e

e

e

e

,

2,

1,

,1

2,1

1,1

e

Page 19: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Levenberg Marquardt Algorithm

• LM algorithm: blend EBP algorithm and Gaussian-Newton algorithm [8-9]

– When evaluation error increases, μ increase, LM algorithm switches to EBP algorithm

– When evaluation error decreases, μ decreases, LM algorithm switches to Gaussian-Newton method

• Advantages– Fast convergence

– Stable training

• Comparing with first order algorithms, LM algorithm has much more powerful search ability, but it also requires more complex computation

kkkkTkk eJIJJw

1

Page 20: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Comparison of Different Algorithms

• Training XOR patterns using different algorithms

XOR problem EBP α=0.1 α=10success rate 100% 18%average iteration 17845.44 179.00average time (ms) 3413.26 46.83

XOR problem EBPusing momentum

α=0.1 α=10m=0.5 m=0.5

success rate 100% 100%average iteration 18415.84 187.76average time (ms) 4687.79 39.27

XOR problem – EBP adjusted learning constantsuccess rate 100%

average iteration 170.23

average time (ms) 41.19

XOR problem – Gaussian-Newton algorithm

success rate 6%

average iteration 1.29

average time (ms) 2.29

XOR problem – LM algorithm

success rate 100%

average iteration 5.49

average time (ms) 4.35

Page 21: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 22: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

How to Design Neural Networks

• Traditional design:– Most popular training algorithm: EBP algorithm

– Most popular network architecture: MLP network

• Results:– Large size neural networks

– Poor generalization ability

– Lots of engineers move to other methods, such as fuzzy systems

Page 23: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

How to Design Neural Networks• B. M. Wilamowski, "Neural Network Architectures and Learning Algorithms: How Not to Be

Frustrated with Neural Networks," IEEE Ind. Electron. Mag., vol. 3, no. 4, pp. 56-63, 2009.– Over-fitting problem

– Mismatch between size of training patterns and network size

• Recommended design policy: compact networks benefit generalization ability– Powerful training algorithm: LM algorithm

– Efficient network architecture: BMLP network and FCC network

2 neurons 3 neurons 4 neurons 5 neurons

6 neurons 7 neurons 8 neurons 9 neurons

Page 24: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 25: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Problems in Second Order Algorithms

• Matrix inversion

– Nature of second order algorithms

– The size of matrix is proportional to the size of networks

– As the size of networks increases, second order algorithms may not as efficient as first order algorithms

1 IJJ T

Page 26: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Problems in Second Order Algorithms

• Architecture limitation• M. T. Hagan and M. Menhaj, "Training feedforward networks with the Marquardt algorithm". IEEE Trans. on Neural

Networks, vol. 5, no. 6, pp. 989-993, 1994. (citation 2474)

– Only developed for training MLP networks

– Not proper for design compact networks

• Neuron-by-Neuron Algorithm• B. M. Wilamowski, N. J. Cotton, O. Kaynak and G. Dundar, "Computing Gradient Vector and Jacobian Matrix in

Arbitrarily Connected Neural Networks", IEEE Trans. on Industrial Electronics, vol. 55, no. 10, pp. 3784-3790, Oct. 2008.

– SPICE computation routines

– Capable of training arbitrarily connected neural networks

– Compact neural network design: NBN algorithm + BMLP (FCC) networks

– Very complex computation

Page 27: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Problems in Second Order Algorithms

• Memory limitation:– The size of Jacobian matrix J is P×M×N

– P is the number of training patterns

– M is the number of outputs

– N is the number of weights

• Practically, the number of training patterns is huge and is encouraged to be as large as possible

• MINST handwritten digit database [10]: 60,000 training patterns, 784 inputs and 10 outputs. Using the simplest network architecture (1 neuron per output), the required memory could be nearly 35 GB.

• Limited by most of the Windows compiler.

1 IJJ T

N

PMPMPM

N

PPP

N

PPP

N

MMM

N

N

w

e

w

e

w

e

w

e

w

e

w

ew

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

ew

e

w

e

w

e

21

2

2

2

1

2

1

2

1

1

1

1

2

1

1

1

12

2

12

1

12

11

2

11

1

11

J

Page 28: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Problems in Second Order Algorithms

• Computational duplication– Forward computation: calculate errors

– Backward computation: error backpropagation

• In second order algorithms, both Hagan and Menhaj LM algorithm and NBN algorithm, the error backpropagation process has to be repeated for each output.

– Very complex

– Inefficient for networks with multiple outputs

...

... ... ......

+1 +1+1

Forward Computation

Backward Computation

Inpu

ts

Ou

tpu

ts

Page 29: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 30: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Second Order Computation – Basic Theory

• Matrix Algebra [11]

• In neural network training, considering– Each pattern is related to one row of Jacobian matrix

– Patterns are independent of each other

P×M

P×MTJ J H

N

N

N

N

N

N TJ Jq

Multiplication Methods

Elements for storage

Row-column (P × M) × N + N × N + NColumn-row N × N + NDifference (P × M) × N

Row-column multiplication

Column-row multiplication

Memory comparison

Multiplication Methods

Addition Multiplication

Row-column (P × M) × N × N (P × M) × N × NColumn-row N × N × (P × M) N × N × (P × M)

Computation comparison

Page 31: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Second Order Computation – Derivation

• Hagan and Menhaj LM algorithm or NBN algorithm

eJIJJw TT 1

• Improved Computation

gIQw 1

2

21

2

2

212

121

2

1

N

pmpm

N

pmpm

N

pm

N

pmpmpmpmpm

N

pmpmpmpmpm

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

pmq

P

p

M

mpm

1 1

qQ

pm

N

pm

pm

pm

pmN

pm

pmpm

pmpm

pm e

w

e

w

ew

e

ew

e

ew

e

ew

e

2

1

2

1

η

P M

mpm

1 1p

ηg

pmTpmpm ejη

pmTpmpm jjq

N

pmpmpm

w

e

w

e

w

e

21pmj

N

PMPMPM

N

PPP

N

PPP

N

MMM

N

N

w

e

w

e

w

e

w

e

w

e

w

ew

e

w

e

w

e

w

e

w

e

w

e

w

e

w

e

w

ew

e

w

e

w

e

21

2

2

2

1

2

1

2

1

1

1

1

2

1

1

1

12

2

12

1

12

11

2

11

1

11

J

PM

P

P

M

e

e

e

e

e

e

2

1

1

12

11

e

Page 32: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Second Order Computation – Pseudo Code

• Properties:– No need for Jacobian matrix storage

– Vector operation instead of matrix operation

• Main contributions:– Significant memory reduction

– Memory reduction benefits computation speed

– NO tradeoff !

• Memory limitation caused by Jacobian matrix storage in second order algorithms is solved

• Again, considering the MINST problem, the memory cost for storage Jacobian elements could be reduced from more than 35 gigabytes to nearly 30.7 kilobytes

% InitializationQ=0;g =0% Improved computationfor p=1:P % Number of patterns % Forward computation … for m=1:M % Number of outputs % Backward computation … calculate vector jpm; calculate sub matrix qpm; calculate sub vector ηpm; Q=Q+qpm; g=g+ηpm; end;

end;

Pseudo Code

Page 33: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Second Order Computation – Experimental Results

• Memory Comparison

• Time Comparison

Parity-N Problems N=14 N=16Patterns 16,384 65,536

Structures 15 neurons 17 neuronsJacobian matrix sizes 5,406,720 27,852,800Weight vector sizes 330 425Average iteration 99.2 166.4

Success Rate 13% 9%Algorithms Actual memory cost

Traditional LM 79.21Mb 385.22MbImproved LM 3.41Mb 4.30Mb

Parity-N Problems N=9 N=11 N=13 N=15Patterns 512 2,048 8,192 32,768Neurons 10 12 14 16Weights 145 210 287 376

Average Iterations 38.51 59.02 68.08 126.08Success Rate 58% 37% 24% 12%Algorithms Averaged training time (s)

Traditional LM 0.78 68.01 1508.46 43,417.06Improved LM 0.33 22.09 173.79 2,797.93

Page 34: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 35: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Traditional Computation – Forward Computation

• For each training pattern p• Calculate net for neuron j

• Calculate output for neuron j

• Calculate derivative for neuron j

• Calculate output at output m

• Calculate error at output m

pmpmpm doe

......

...

...

yi

...wi

epm=opm-dpm

yjsj

netj

...

)( jj netf )(, jjm yF mo2,jy1,jw

jy2,jw

ijw,

nijw ,

0,jw

1

1,jy

1, nijwijy ,

1, nijy

nijy ,

j,0

ni

iijijj wywnet

1,,

jjj netfy

j

jj

j

jj net

netf

net

ys

jjmm yFo ,

Page 36: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Traditional Computation – Backward Computation

• For first order algorithms• Calculate delta [12]

• Do gradient vector

• For second order algorithms• Calculate delta

• Calculate Jacobian elements

no

mmjmjj eFs

1

',

jijij

ij yw

Eg ,

,,

jmijij

mp yw

e,,

,

,

',, jmjjm Fs

Page 37: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Forward-Only Algorithm

• Extend the concept of backpropagation factor δ– Original definition: backpropagated from output m to neuron j

– Our definition: backpropagated from neuron k to neuron j

netj sj

yjnetk sk yk

jm,

jjkjk sF',,

km,

jkF ,'

netw

ork

inpu

ts

o1

om

netw

ork

ou

tput

s

',, jmjjm Fs

jjkjk sF ',,

Page 38: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Forward-Only Algorithm

• Regular Table– lower triangular elements: k≥j, matrix δ has triangular shape

– diagonal elements: δk,k=sk

– Upper triangular elements: weight connections between neurons

1

2

2

1 j

j

k

k

nn

nn

1,1NeuronIndex

2,2

jj,

kk ,

nnnn,

1,2

1,j 2,j

1,k 2,k jk,

1,nn 2,nn jnn, knn,

2,1w jw ,1 kw ,1 nnw ,1

jw ,2 nnw ,2kw ,2

kjw , nnjw ,

nnkw ,

1

,,,,

k

jijikikkjk w jk

jk kkk s,

jk 0, jk

Page 39: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Forward-Only Algorithm

• Train arbitrarily connected neural networks

5

4

1

2

3

6

Index 5 641 32

1

2

3

4

5

6

2s1s

3s

4s

5s

6s

1,2

1,3 2,3

1,4 2,4 3,4

2,5 3,51,5 4,5

1,6 2,6 3,6 4,6 5,6

4,1w2,1w 6,1w3,1w 5,1w

4,2w 5,2w 6,2w3,2w

6,3w4,3w 5,3w

6,5w5,4w 6,4w

Index 5 641 32

1

2

3

4

5

6

2s1s

3s

4s

5s

6s2,5 3,51,5 4,5

1,6 2,6 3,6 4,6

6,1w5,1w

5,2w 6,2w

6,3w5,3w

5,4w 6,4w

1

2

3

6

4

50 0 0

0 00

00

000

00 0

Index 5 641 32

1

2

3

4

5

6

2s1s

3s

4s

5s

6s

1,3

2,4

2,5 3,51,5 4,5

1,6 3,6

6,1w3,1w

4,2w 5,2w

6,3w5,3w

5,4w

00 0

000

0 0

0 00

0

0 00

1

2

3

4

5

6

Page 40: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Forward-Only Algorithm• Train networks with multiple outputs

• The more outputs the networks have, the more efficient the forward-only algorithm will be

1 output 2 outputs

3 outputs 4 outputs

Page 41: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Forward-Only Algorithm

• Pseudo codes of two different algorithms

• In forward-only computation, the backward computation (bold in left figure) is replaced by extra computation in forward process (bold in right figure)

for all patterns (np)% Forward computation for all neurons (nn) for all weights of the neuron (nx) calculate net; end; calculate neuron output; calculate neuron slope; set current slope as delta; for weights connected to previous neurons (ny) for previous neurons (nz)

multiply delta through weights then sum; end; multiply the sum by the slope;

end; related Jacobian elements computation; end; for all outputs (no) calculate error; end;end;

for all patterns% Forward computation for all neurons (nn) for all weights of the neuron (nx) calculate net; end; calculate neuron output; calculate neuron slope; end; for all outputs (no) calculate error; %Backward computation initial delta as slope; for all neurons starting from output neurons (nn) for the weights connected to other neurons (ny) multiply delta through weights sum the backpropagated delta at proper nodes end; multiply delta by slope (for hidden neurons); end; end;end;

Traditional forward-backward algorithm

Forward-only algorithm

Page 42: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Forward-Only Algorithm

• Computation cost estimation

• Properties of the forward-only algorithm– Simplified computation: organized in a regular table with general formula

– Easy to be adapted for training arbitrarily connected neural networks

– Improved computation efficiency for networks with multiple outputs

• Tradeoff – Extra memory is required to store the extended δ array

Hagan and Menhaj ComputationForward Part Backward Part

+/– nn×nx + 3nn + no no×nn×ny

×/÷ nn×nx + 4nn no×nn×ny + no×(nn – no)Exp nn 0

Forward-only computationForward Backward

+/– nn×nx + 3nn + no + nn×ny×nz 0

×/÷ nn×nx + 4nn + nn×ny + nn×ny×nz 0Exp nn 0

Subtraction forward-only from traditional

+/– nn×ny×(no – 1)

×/÷ nn×ny×(no – 1) + no×(nn – no) – nn×ny×nzexp 0

0 20 40 60 80 1000.4

0.5

0.6

0.7

0.8

0.9

1

The number of hidden neurons

Rat

io o

f tim

e co

nsum

ptio

n

Number of output=1 to 10

MLP networks with one hidden layer; 20 inputs

Page 43: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Forward-Only Algorithm

• Experiments: training compact neural networks with good generalization ability

Neurons

Success Rate Average Iteration Average Time (s)EBP FO EBP FO EBP FO

8 0% 5% Failing 222.5 Failing 0.339 0% 25% Failing 214.6 Failing 0.58

10 0% 61% Failing 183.5 Failing 0.7011 0% 76% Failing 177.2 Failing 0.9312 0% 90% Failing 149.5 Failing 1.0813 35% 96% 573,226 142.5 624.88 1.3514 42% 99% 544,734 134.5 651.66 1.7615 56% 100% 627,224 119.3 891.90 1.85

8 neurons, FOSSETrain=0.0044, SSEVerify=0.0080

8 neurons, EBPSSETrain=0.0764, SSEVerify=0.1271Under-fitting

12 neurons, EBPSSETrain=0.0018, SSEVerify=0.4909Over-fitting

Page 44: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Proposed Forward-Only Algorithm

• Experiments: comparison of computation efficiency

Computation methods

Time cost (ms/iteration) Relative timeForward Backward

Traditional 8.24 1,028.74 100.0%Forward-only 61.13 0.00 5.9%

Problems Computation Methods

Time Cost (ms/iteration) Relative TimeForward Backward

8-bit signal Traditional 40.59 468.14 100.0%Forward-only 175.72 0.00 34.5%

End Effector

α

β

L1

L2

Computation methods

Time cost (ms/iteration) Relative timeForward Backward

Traditional 0.307 0.771 100.0%Forward-only 0.727 0.00 67.4%ASCII to Images

Forward Kinematics [13]

Error Correction

Page 45: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 46: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Software• The tool NBN Trainer is developed based on Visual C++ and used for training neural networks

• Pattern classification and recognition• Function approximation• Available online (currently free): http://www.eng.auburn.edu/~wilambm/nnt/index.htm

Page 47: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Parity-2 Problem

• Parity-2 Patterns

Page 48: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Outlines

• Why Neural Networks

• Network Architectures

• Training Algorithms

• How to Design Neural Networks

• Problems in Second Order Algorithms

• Proposed Second Order Computation

• Proposed Forward-Only Algorithm

• Neural Network Trainer

• Conclusion & Recent Research

Page 49: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Conclusion

• Second order algorithms are more efficient and advanced in training neural networks

• The proposed second order computation removes Jacobian matrix storage and multiplication. It solves memory limitation

• The proposed forward-only algorithm simplifies the computation process in second order training: a regular table + a general formula

• The proposed forward-only algorithm can handle arbitrarily connected neural networks

• The proposed forward-only algorithm has speed benefit for networks with multiple outputs

Page 50: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Recent Research

• RBF networks– ErrCor algorithm: hierarchical training algorithm– Network size increases based on the training information– No more trial-by-trial

• Applications of Neural Networks (future work)– Dynamic controller design

– Smart grid distribution systems

– Pattern recognition in EDA software design

Page 51: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

References[1] J. X. Peng, Kang Li, G.W. Irwin, "A New Jacobian Matrix for Optimal Learning of Single-Layer Neural Networks," IEEE Trans. on

Neural Networks, vol. 19, no. 1, pp. 119-129, Jan 2008

[2] K. Hornik, M. Stinchcombe and H. White, "Multilayer Feedforward Networks Are Universal Approximators," Neural Networks, vol. 2, issue 5, pp. 359-366, 1989.

[3] D. E. Rumelhart, G. E. Hinton and R. J. Wiliams, "Learning representations by back-propagating errors," Nature, vol. 323, pp. 533-536, 1986 MA.

[4] V. V. Phansalkar, P.S. Sastry, "Analysis of the back-propagation algorithm with momentum," IEEE Trans. on Neural Networks, vol. 5, no. 3, pp. 505-506, March 1994.

[5] M. Riedmiller, H. Braun, "A direct adaptive method for faster backpropagation learning: The RPROP algorithm". Proc. International Conference on Neural Networks, San Francisco, CA, 1993, pp. 586-591.

[6] Scott E. Fahlman. Faster-learning variations on back-propagation: An empirical study. In T. J. Sejnowski G. E. Hinton and D. S. Touretzky, editors, 1988 Connectionist Models Summer School, San Mateo, CA, 1988. Morgan Kaufmann.

[7] M. R. Osborne, "Fisher’s method of scoring," Internat. Statist. Rev., 86 (1992), pp. 271-286.

[8] K. Levenberg, "A method for the solution of certain problems in least squares," Quarterly of Applied Machematics, 5, pp. 164-168, 1944.

[9] D. Marquardt, "An algorithm for least-squares estimation of nonlinear parameters," SIAM J. Appl. Math., vol. 11, no. 2, pp. 431-441, Jun. 1963.

[10] L. J. Cao, S. S. Keerthi, Chong-Jin Ong, J. Q. Zhang, U. Periyathamby, Xiu Ju Fu, H. P. Lee, "Parallel sequential minimal optimization for the training of support vector machines," IEEE Trans. on Neural Networks, vol. 17, no. 4, pp. 1039- 1049, April 2006.

[11] D. C. Lay, Linear Algebra and its Applications. Addison-Wesley Publishing Company, 3rd version, pp. 124, July, 2005.

[12] H. N. Robert, "Theory of the Back Propagation Neural Network," Proc. 1989 IEEE IJCNN, 1593-1605, IEEE Press, New York, 1989.

[13] N. J. Cotton and B. M. Wilamowski, "Compensation of Nonlinearities Using Neural Networks Implemented on Inexpensive Microcontrollers" IEEE Trans. on Industrial Electronics, vol. 58, No 3, pp. 733-740, March 2011.

Page 52: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Prepared Publications – Journals• H. Yu, T. T. Xie, Stanisław Paszczyñski and B. M. Wilamowski, "Advantages of Radial Basis

Function Networks for Dynamic System Design," IEEE Trans. on Industrial Electronics (Accepted and scheduled publication in December, 2011)

• H. Yu, T. T. Xie and B. M. Wilamowski, "Error Correction – A Robust Learning Algorithm for Designing Compact Radial Basis Function Networks," IEEE Trans. on Neural Networks (Major revision)

• T. T. Xie, H. Yu, J. Hewllet, Pawel Rozycki and B. M. Wilamowski, "Fast and Efficient Second Order Method for Training Radial Basis Function Networks," IEEE Trans. on Neural Networks (Major revision)

• A. Malinowski and H. Yu, "Comparison of Various Embedded System Technologies for Industrial Applications," IEEE Trans. on Industrial Informatics, vol. 7, issue 2, pp. 244-254, May 2011

• B. M. Wilamowski and H. Yu, "Improved Computation for Levenberg Marquardt Training," IEEE Trans. on Neural Networks, vol. 21, no. 6, pp. 930-937, June 2010 (14 citations)

• B. M. Wilamowski and H. Yu, "Neural Network Learning Without Backpropagation," IEEE Trans. on Neural Networks, vol. 21, no.11, pp. 1793-1803, Nov. 2010 (5 citations)

• Pierluigi Siano, Janusz Kolbusz, H. Yu and Carlo Cecati, "Real Time Operation of a Smart Microgrid via FCN Networks and Optimal Power Flow," IEEE Trans. on Industrial Informatics (under reviewing)

Page 53: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Prepared Publications – Conferences

• H. Yu and B. M. Wilamowski, "Efficient and Reliable Training of Neural Networks," IEEE Human System Interaction Conference, HSI 2009, Catania. Italy, May 21-23, 2009, pp. 109-115. (Best paper award in Computational Intelligence section) (11 citations)

• H. Yu and B. M. Wilamowski, "C++ Implementation of Neural Networks Trainer," 13th IEEE Intelligent Engineering Systems Conference, INES 2009, Barbados, April 16-18, 2009, pp. 237-242 (8 citations)

• H. Yu and B. M. Wilamowski, "Fast and efficient and training of neural networks," in Proc. 3nd IEEE Human System Interaction Conf. HSI 2010, Rzeszow, Poland, May 13-15, 2010, pp. 175-181 (2 citations)

• H. Yu and B. M. Wilamowski, "Neural Network Training with Second Order Algorithms," monograph by Springer on Human-Computer Systems Interaction. Background and Applications, 31st October, 2010. (Accepted)

• H. Yu, T. T. Xie, M. Hamilton and B. M. Wilamowski, "Comparison of Different Neural Network Architectures for Digit Image Recognition," in Proc. 3nd IEEE Human System Interaction Conf. HSI 2011, Yokohama, Japan, pp. 98-103, May 19-21, 2011

• N. Pham, H. Yu and B. M. Wilamowski, "Neural Network Trainer through Computer Networks," 24 th IEEE International Conference on Advanced Information Networking and Applications, AINA2010 , Perth, Australia, April 20-23, 2010, pp. 1203-1209 (1 citations)

• T. T. Xie, H. Yu and B. M. Wilamowski, "Replacing Fuzzy Systems with Neural Networks," in Proc. 3nd IEEE Human System Interaction Conf. HSI 2010, Rzeszow, Poland, May 13-15, 2010, pp. 189-193.

• T. T. Xie, H. Yu and B. M. Wilamowski, "Comparison of Traditional Neural Networks and Radial Basis Function Networks," in Proc. 20th IEEE International Symposium on Industrial Electronics, ISIE2011, Gdansk, Poland, 27-30 June 2011 (Accepted)

Page 54: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Prepared Publications – Chapters for IE Handbook (2nd Edition)

• H. Yu and B. M. Wilamowski, "Levenberg Marquardt Training," Industrial Electronics Handbook, vol. 5 – INTELLIGENT SYSTEMS, 2nd Edition, 2010, chapter 12, pp. 12-1 to 12-16, CRC Press.

• H. Yu and M. Carroll, "Interactive Website Design Using Python Script," Industrial Electronics Handbook, vol. 4 – INDUSTRIAL COMMUNICATION SYSTEMS, 2nd Edition, 2010, chapter 62, pp. 62-1 to 62-8, CRC Press.

• B. M. Wilamowski, H. Yu and N. Cotton, "Neuron by Neuron Algorithm," Industrial Electronics Handbook, vol. 5 – INTELLIGENT SYSTEMS, 2nd Edition, 2010, chapter 13, pp. 13-1 to 13-24, CRC Press.

• T. T. Xie, H. Yu and B. M. Wilamowski, "Neuro-fuzzy System," Industrial Electronics Handbook, vol. 5 – INTELLIGENT SYSTEMS, 2nd Edition, 2010, chapter 20, pp. 20-1 to 20-9, CRC Press.

• B. M. Wilamowski, H. Yu and K. T. Chung, "Parity-N problems as a vehicle to compare efficiency of neural network architectures," Industrial Electronics Handbook, vol. 5 – INTELLIGENT SYSTEMS, 2nd Edition, 2010, chapter 10, pp. 10-1 to 10-8, CRC Press.

Page 55: Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici

Thanks