Backpropagation and Resilient Propagation

EE 7004

BACK PROPAGATION

AND

RESILIENT PROPAGATION

SUBMITTED BY

KAMALJYOTI GOGOI

MT/12/PSE/007

BACK PROPAGATION

Back propagation is a common method of training artificial neural networks so as to

minimize the objective function. It is a supervised learning method. It requires a dataset of the

desired output for many inputs, making up the training set. It is most useful for feed-forward

networks (networks that have no feedback or simply, that have no connections). The term is an

abbreviation for "backward propagation of errors". Back propagation requires that the activation

function to be used by the artificial neurons (or "nodes") to be differentiable.

For better understanding of the back propagation algorithm it can be divided into two phases:

propagation and weight update.

Propagation

Each propagation involves the following steps:

1. Forward propagation of a training pattern's input through the neural network in order to

generate the propagation's output activations.

2. Backward propagation of the propagation's output activations through the neural network

using the training pattern's target in order to generate the deltas of all output and hidden

neurons.

Phase 2: Weight update

For each weight-synapse follow the following steps:

1. Multiply its output delta and input activation to get the gradient of the weight.

2. Bring the weight in the opposite direction of the gradient by subtracting a ratio of it from

the weight.

This ratio influences the speed and quality of learning is called the learning rate. The sign of the

gradient of a weight indicates where the error is increasing; this is why the weight must be

updated in the opposite direction.

Phase 1 and 2 is repeated until the performance of the network is satisfactory.

http://en.wikipedia.org/wiki/Artificial_neuron

http://en.wikipedia.org/wiki/Activation_function

http://en.wikipedia.org/wiki/Activation_function

As the name implies, the errors propagate backwards from the output nodes to the inner nodes. Back

propagation calculates the gradient of the error of the network regarding the network's modifiable

weights. This gradient is almost always used in a simple stochastic gradient descent algorithm to find

weights that minimize the error. Often the term "back propagation" is used in a more general sense, to

refer to the entire procedure encompassing both the calculation of the gradient and its use in stochastic

gradient descent. Back propagation usually allows quick convergence on satisfactory local minima for

error in the kind of networks to which it is suited.

Back propagation networks are necessarily multilayer perceptions (usually with one input, one hidden,

and one output layer). In order for the hidden layer to serve any useful function, multilayer networks

must have non-linear activation functions for the multiple layers: a multilayer network using only linear

activation functions is equivalent to some single layer, linear network. Non-linear activation functions

that are commonly used include the logistic function, the softmax function, and the Gaussian functions.

It is also closely related to the Gauss–Newton algorithm.

Fig: Back Propagation

http://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm

http://en.wikipedia.org/wiki/Multilayer_perceptron

http://en.wikipedia.org/wiki/Maxima_and_minima

http://en.wikipedia.org/wiki/Stochastic_gradient_descent

Flowchart of Back Propagation Algorithm

RESILIENT PROPAGATION

Resilient Propagation is one of the best general purpose training methods for neural

networks. It is not the best training method in every case, but in most cases it is. It can be used

for feed forward neural networks and simple recurrent neural networks. Resilient propagation

is a type of propagation training which is susceptible to the Flat Spot Problem for certain

activation functions.

It will typically outperform back propagation by a considerable factor. Additionally, it

has no parameters that must be set. Back propagation requires that a learning rate and

momentum value be specified. Finding an optimal learning rate and momentum value for back

propagation can be difficult. This is not necessary in case of resilient propagation.

The Resilient Propagation algorithm is one of the most popular adaptive learning rates

training algorithms. It employs a sign-based scheme to eliminate harmful influences of

derivatives’ magnitude on the weight updates, and is eminently suitable for applications where

the gradient is numerically estimated or the error is noisy. It is easy to implement in hardware

and is not susceptible to numerical problems. The idea behind it is that it has motivated the

development of several variants with the aim to improve the convergence behavior and

effectiveness of the original method. Thus hybrid learning schemes have been proposed to

incorporate second derivative related information in it, which approximates the second

derivative by one–dimensional secant steps, and the Diagonal Estimation which directly

computes the diagonal elements of the Hessian matrix. Also approaches inspired from global

optimization theory have been developed to equip it with annealing strategies, such as the

Simulated Annealing and the Restart mode Simulated Annealing in order to escape from

shallow local minima. Recently, the Improved algorithm which applies a backtracking strategy

(i.e. it decides whether to take back a step along a weight direction or not by means of a

heuristic), has shown improved convergence speed when compared against existing variants, as

well as other training methods. Relevant literature shows that its learning schemes exhibit fast

convergence in empirical evaluations, but usually require introducing or even fine tuning

additional heuristics. For example, annealing schedules require heuristics for the acceptance

http://www.heatonresearch.com/w/index.php?title=Momentum&action=edit&redlink=1

http://www.heatonresearch.com/wiki/Learning_rate

http://www.heatonresearch.com/wiki/Flat_Spot_Problem

probability and the visiting distribution, whilst second derivative methods employ heuristics in

the various approximations of the second derivative. Moreover, literature shows a lack of

theoretical results underpinning its modifications, particularly the first–order methods. This is

not surprising as heuristics make difficult to guarantee converge to a local minimiser of the

error function when adaptive learning rates for each weight are used in calculating the weight

updates.

Fig: Resilient Propagation

Documents

Backpropagation and Resilient Propagation