Upload
anilshaw27
View
78
Download
4
Embed Size (px)
DESCRIPTION
resilient propagation
Citation preview
EE 7004
BACK PROPAGATION
AND
RESILIENT PROPAGATION
SUBMITTED BY
KAMALJYOTI GOGOI
MT/12/PSE/007
BACK PROPAGATION
Back propagation is a common method of training artificial neural networks so as to
minimize the objective function. It is a supervised learning method. It requires a dataset of the
desired output for many inputs, making up the training set. It is most useful for feed-forward
networks (networks that have no feedback or simply, that have no connections). The term is an
abbreviation for "backward propagation of errors". Back propagation requires that the activation
function to be used by the artificial neurons (or "nodes") to be differentiable.
For better understanding of the back propagation algorithm it can be divided into two phases:
propagation and weight update.
Propagation
Each propagation involves the following steps:
1. Forward propagation of a training pattern's input through the neural network in order to
generate the propagation's output activations.
2. Backward propagation of the propagation's output activations through the neural network
using the training pattern's target in order to generate the deltas of all output and hidden
neurons.
Phase 2: Weight update
For each weight-synapse follow the following steps:
1. Multiply its output delta and input activation to get the gradient of the weight.
2. Bring the weight in the opposite direction of the gradient by subtracting a ratio of it from
the weight.
This ratio influences the speed and quality of learning is called the learning rate. The sign of the
gradient of a weight indicates where the error is increasing; this is why the weight must be
updated in the opposite direction.
Phase 1 and 2 is repeated until the performance of the network is satisfactory.
As the name implies, the errors propagate backwards from the output nodes to the inner nodes. Back
propagation calculates the gradient of the error of the network regarding the network's modifiable
weights. This gradient is almost always used in a simple stochastic gradient descent algorithm to find
weights that minimize the error. Often the term "back propagation" is used in a more general sense, to
refer to the entire procedure encompassing both the calculation of the gradient and its use in stochastic
gradient descent. Back propagation usually allows quick convergence on satisfactory local minima for
error in the kind of networks to which it is suited.
Back propagation networks are necessarily multilayer perceptions (usually with one input, one hidden,
and one output layer). In order for the hidden layer to serve any useful function, multilayer networks
must have non-linear activation functions for the multiple layers: a multilayer network using only linear
activation functions is equivalent to some single layer, linear network. Non-linear activation functions
that are commonly used include the logistic function, the softmax function, and the Gaussian functions.
It is also closely related to the Gauss–Newton algorithm.
Fig: Back Propagation
Flowchart of Back Propagation Algorithm
RESILIENT PROPAGATION
Resilient Propagation is one of the best general purpose training methods for neural
networks. It is not the best training method in every case, but in most cases it is. It can be used
for feed forward neural networks and simple recurrent neural networks. Resilient propagation
is a type of propagation training which is susceptible to the Flat Spot Problem for certain
activation functions.
It will typically outperform back propagation by a considerable factor. Additionally, it
has no parameters that must be set. Back propagation requires that a learning rate and
momentum value be specified. Finding an optimal learning rate and momentum value for back
propagation can be difficult. This is not necessary in case of resilient propagation.
The Resilient Propagation algorithm is one of the most popular adaptive learning rates
training algorithms. It employs a sign-based scheme to eliminate harmful influences of
derivatives’ magnitude on the weight updates, and is eminently suitable for applications where
the gradient is numerically estimated or the error is noisy. It is easy to implement in hardware
and is not susceptible to numerical problems. The idea behind it is that it has motivated the
development of several variants with the aim to improve the convergence behavior and
effectiveness of the original method. Thus hybrid learning schemes have been proposed to
incorporate second derivative related information in it, which approximates the second
derivative by one–dimensional secant steps, and the Diagonal Estimation which directly
computes the diagonal elements of the Hessian matrix. Also approaches inspired from global
optimization theory have been developed to equip it with annealing strategies, such as the
Simulated Annealing and the Restart mode Simulated Annealing in order to escape from
shallow local minima. Recently, the Improved algorithm which applies a backtracking strategy
(i.e. it decides whether to take back a step along a weight direction or not by means of a
heuristic), has shown improved convergence speed when compared against existing variants, as
well as other training methods. Relevant literature shows that its learning schemes exhibit fast
convergence in empirical evaluations, but usually require introducing or even fine tuning
additional heuristics. For example, annealing schedules require heuristics for the acceptance
probability and the visiting distribution, whilst second derivative methods employ heuristics in
the various approximations of the second derivative. Moreover, literature shows a lack of
theoretical results underpinning its modifications, particularly the first–order methods. This is
not surprising as heuristics make difficult to guarantee converge to a local minimiser of the
error function when adaptive learning rates for each weight are used in calculating the weight
updates.
Fig: Resilient Propagation