Upload
sreerag-kunnathu-sugathan
View
76
Download
2
Embed Size (px)
DESCRIPTION
Statistical training of Neural Networks.Boltman training, nonlinear optimization techniques.
Citation preview
MODULE-IV
STATISTICAL METHODS IN ANN
Module 4
Statistical Methods: Boltzmann's Training - Cauchy
training - Artificial specific heat methods - applications
to general non-linear optimization problems
Statistical Methods are used for
Training ANN
Producing output from trained network
Training Methods
Deterministic Methods
Statistical Training Methods
Deterministic Training Method
Follows a step by step procedure.
Weights are changed based on their current
values of weight.
It also based on the desired output and the
actual output.
E.g.:-Perceptron Training Algorithm.
Back Propagation Algorithm etc…
Statistical Training Methods
Make pseudo random change in the weights
Retains only those change which results in
improvements.
GENERAL PROCEDURE( FOR STTISTICAL TRAINING METHOD)
Apply a set of input and compute the resulting
output
Compare the result with target, find the error.
The objective of the training is to minimize the error.
Select a weight in random and adjust it by a small
random amount.
If the adjustment improves our objective retain
the change
Otherwise return the weight to the previous
value
Repeat the procedures until the network is
trained to the desired level
The local minima problem
The objective function minimization problem can get trapped in poor solution.
Weight
A
B
Obj
ectiv
e F
unct
ion
If the objective function is at A and if the random
weight changes are small then the weight
adjustment will be rejected.
The superior weight setting at point B will never
found and the system will be trapped in local
minima instead of global minima at point B.
If the random weight changes are large both point
A and B are visited frequently, but so will every
other point.
The weight will change so drastically that it will
never settle at desired point.
Solution & Explanation
Statistical method overcome local minima problem by a weight adjustment strategy.
Example:
Let the fig. represents a ball on a surface in a
box.
If the box is shaken violently ,then the ball will
move rapidly from one side to the other side.
The probability to occupy any point on the
surface is equal for all points.
If the violence of shaking is gradually reduced the ball
will stick to both point A and B.
If the shaking is again reduced it will settle to point B.
The ANN are trained in the same way as through
random weight adjustment.
At first large random adjustment are made.
The weight change that improves the objective
function is retained.
The average step size is hence gradually reduced to
reach global minimum.
Annealing [ Boltzmann Law ]
Annealing:-If a metal is raised to a temperature above melting point ,the atoms are in violent random motion. The atoms always tend to reach a minimum energy state. As the metal is gradually cooled the atoms enters a minimum possible energy state corresponds to each temperature.
)/(exp)( kTeeP P(e)=probability that the system is in a state with energy e.,k Boltzmann’s constant. T –temperature.
Simulated Annealing [Boltzmann Traing]
Define a variable T that represents an artificial
temperature. (Start with T at large value).
Apply a set of input to the network, and calculate
the outputs and objective function.
Make a random change weight and recalculate the
network output.
Calculate new objective function.
If the objective function is reduced, retain the
weight change.
If the weight change results in an increase in
objective function ,calculate the probability of
accepting the weight change.
)/(exp)( kTccP
P(c)=probability of a change of c in the objective function,k Boltzmann’s constant. T –temperature.
Select a random number r from a uniform
distribution between zero and one.
If p(c) is greater than r, retain the change otherwise
return the weight to previous value.
This allows the system to take a step in a
direction that worsen the objective function,
hence escapes from local minimum.
Repeat the weight change process over each of the
weights in the network, gradually reducing the
temperature T until an acceptably low value for
objective function is obtained.
How to select weights/artificial Temperature for training
The size of the random weight change is selected by various methods.Eg:-
P(w)=Probability of a weight change of size w.
)/exp()( 22 TwP
T=artificial temperature
To achieve global minimum at the earliest the cooling rate is usually expressed as follows
))1(log()( 0
t
TtT
•The main disadvantage of Boltzmann’s training is very
low cooling rate and hence long computations.
•Boltzmann’s machine usually takes impractical time
for training.
Cauchy Training
Cauchy training method is more rapid than
Boltzmann training.
Cauchy training substitutes cauchy’s distribution for
Boltzmann's distribution.
Caushy’s distribution has longer “tails", hence more
probability for larger step size.
The temperature reduction rate is changed to
inverse linear. (For Boltzmann training it was inverse
logarithmic.)
Cauchy ‘s distribution is
])([
)()(
22 xtT
tTxP
The inverse linear relationship for temperature reduction reduces the training time.
)1()( 0
t
TtT