48
Supervised learning 1. Early learning algorithms 2. First order gradient methods 3. Second order gradient methods

Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

  • View
    220

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Supervised learning

1. Early learning algorithms

2. First order gradient methods

3. Second order gradient methods

Page 2: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Early learning algorithms

• Designed for single layer neural networks

• Generally more limited in their applicability

• Some of them are– Perceptron learning– LMS or Widrow- Hoff learning– Grossberg learning

Page 3: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Perceptron learning

1. Randomly initialize all the networks weights.

2. Apply inputs and find outputs ( feedforward).

3. compute the errors.

4. Update each weight as

5. Repeat steps 2 to 4 until the errors reach the satisfactory level.

)()()()1( kekpkwkw jiijij

Page 4: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Performance OptimizationGradient based methods

Page 5: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Basic Optimization Algorithm

Page 6: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Steepest Descent (first order Taylor expansion)

Page 7: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Example

Page 8: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Plot

-2 -1 0 1 2-2

-1

0

1

2

Page 9: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

LMS or Widrow- Hoff learning

• First introduce ADALINE (ADAptive LInear NEuron) Network

Page 10: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

LMS or Widrow- Hoff learningor Delta Rule

• ADALINE network same basic structure as the perceptron network

Page 11: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Approximate Steepest Descent

Page 12: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Approximate Gradient Calculation

Page 13: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

LMS AlgorithmThis algorithm inspire from steepest

descent algorithm

Page 14: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Multiple-Neuron Case

Page 15: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Difference between perceptron learning and LMS learning

• DERIVATIVE

• Linear activation function has derivative

but

• sign (bipolar, unipolar) has not derivative

Page 16: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Grossberg learning (associated learning)

• Sometimes known as instar and outstar training• Updating rule:

• Where could be the desired input values (instar training, example: clustering) or the desired output values (outstar) depending on network structure.

• Grossberg network (use Hagan to more details)

)()()()1( kwkxkwkw iiii

ix

Page 17: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

First order gradient method

Back propagation

Page 18: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Multilayer Perceptron

R – S1 – S2 – S3 Network

Page 19: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Example

Page 20: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Elementary Decision Boundaries

Page 21: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Elementary Decision Boundaries

Page 22: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Total Network

Page 23: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Function Approximation Example

Page 24: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Nominal Response

-2 -1 0 1 2-1

0

1

2

3

Page 25: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Parameter Variations

Page 26: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Multilayer Network

Page 27: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Performance Index

Page 28: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Chain Rule

Page 29: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Gradient Calculation

Page 30: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Steepest Descent

Page 31: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Jacobian Matrix

Page 32: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Backpropagation (Sensitivities)

Page 33: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Initialization (Last Layer)

Page 34: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Summary

Page 35: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

• Back-propagation training algorithm

• Backprop adjusts the weights of the NN in order to minimize the network total mean squared error.

Network activationForward Step

Error propagationBackward Step

Summary

Page 36: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Example: Function Approximation

Page 37: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Network

Page 38: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Initial Conditions

Page 39: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Forward Propagation

Page 40: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Transfer Function Derivatives

Page 41: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Backpropagation

Page 42: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Weight Update

Page 43: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Choice of Architecture

Page 44: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Choice of Network Architecture

Page 45: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Convergence Global minium (left) local minimum

(rigth)

Page 46: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Generalization

Page 47: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Disadvantage of BP algorithm

• Slow convergence speed• Sensitivity to initial conditions• Trapped in local minima• Instability if learning rate is too large

• Note: despite above disadvantages, it is popularly used in control community. There are numerous extensions to improve BP algorithm.

Page 48: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Improved BP algorithms(first order gradient method)

1. BP with momentum

2. Delta- bar- delta

3. Decoupled momentum

4. RProp

5. Adaptive BP

6. Trinary BP

7. BP with adaptive gain

8. Extended BP