19
Overview of Back Propagation Algorithm Shuiwang Ji

Overview of Back Propagation Algorithm Shuiwang Ji

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Overview of Back Propagation Algorithm Shuiwang Ji

Overview of Back Propagation Algorithm

Shuiwang Ji

Page 2: Overview of Back Propagation Algorithm Shuiwang Ji

A Sample Network

Page 3: Overview of Back Propagation Algorithm Shuiwang Ji

Forward Operation

• The general feed-forward operation is:

Page 4: Overview of Back Propagation Algorithm Shuiwang Ji

Back Propagation Algorithm• The hidden to output weights can be learned by minimizing the error

• The power of back-propagation is that it allows us to calculate an effective error for each hidden unit, and thus derive a learning rule for the input-to-hidden weights

• We consider the error function:

• The update rule is:

Page 5: Overview of Back Propagation Algorithm Shuiwang Ji

Hidden-to-output Weights

The chain rule:

The sensitivity of unit k is:

and

Overall, the derivative is:

Page 6: Overview of Back Propagation Algorithm Shuiwang Ji

Input-to-hidden WeightsThe chain rule:

The real back propagation:

Overall the rule is:

Page 7: Overview of Back Propagation Algorithm Shuiwang Ji

Back Propagation of Sensitivity

1. The sensitivity at a hidden unit is proportional to the weighted sum of the sensitivities at the output units

2. The output unit sensitivities are thus propagated “back” to the hidden units

Page 8: Overview of Back Propagation Algorithm Shuiwang Ji

Training Hierarchical Feed-forward Visual Recognition Models Using

Transfer Learning from Pseudo-Tasks

ECCV’08 Kai Yu

Presented by Shuiwang Ji

Page 9: Overview of Back Propagation Algorithm Shuiwang Ji

Transfer Learning

• Transfer learning, also known as multi-task learning, is a mechanism that improves generalization by leveraging shared domain-specific information contained in related tasks

• In the setting considered in this paper, all tasks share the same input space

Page 10: Overview of Back Propagation Algorithm Shuiwang Ji

General Formulation

• The main task to be learnt has index m with training examples

• A neural network has a natural architecture to tackle this learning problem by minimizing:

Page 11: Overview of Back Propagation Algorithm Shuiwang Ji

General Formulation

• The is learned by additionally introducing pseudo auxiliary tasks, each represented by learning the input-output pairs:

• Then the regularization term becomes

• A Bayesian perspective (skipped)

Page 12: Overview of Back Propagation Algorithm Shuiwang Ji

CNN for Transfer Learning• Input: 140x140 pixel images, including R/G/B channels and additionally two

channels Dx and Dy, which are the horizontal and vertical gradients of gray intensities

• C1 layer: 16 filters of size 16 by 16• P1 layer: max pooling over each 5 by 5 neighborhood• C2 layer: 256 filters of size 6 by 6, connections with sparsity 0.5 between

the 16 dimensions of P1 layer and the 256 dimensions of C2 layer• P2 layer: max pooling over each 5 by 5 neighborhood• Output layer: full connections between (256 by 4 by 4) P2 features and

outputs

Page 13: Overview of Back Propagation Algorithm Shuiwang Ji

Generating Pseudo Tasks

1. The pseudo-task is constructed by sampling a random 2D patch and using it as a template to form a local 2D filter that operates on every training image. The value assigned to an image under this task is taken to be the maximum over the result of this 2D convolution operation

2. brittle to scale, translation, and slight intensity variations

Page 14: Overview of Back Propagation Algorithm Shuiwang Ji

Generating Pseudo Tasks

• Applying Gabor filters of 4 orientations and 16 scales result in 64 feature maps of size 104*104 for each image

• Max-pooling operation is performed first within each non-overlapping 4*4 neighborhood and then within each band of two successive scales resulting in 32 feature maps of size 26*26 for each image

• An set of K RBF filter of size 7*7 with 4 orientations are then sampled and used as the parameters of the pseudo-tasks, resulting in 8 feature maps of size 20*20

• Finally, max pooling is performed on the result across all the scales and within every non-overlapping 10*10 neighborhood, giving a 2*2 feature map which constitutes the value of this image under this pseudo-task

• Obtained 4*K pseudo-tasks (K actual random patches, each operating at a different quadrant of the image)

Page 15: Overview of Back Propagation Algorithm Shuiwang Ji

Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields, IJCV, in press

Page 16: Overview of Back Propagation Algorithm Shuiwang Ji

Results on Caltech-101

0.18 second for testing one image (the forward operation)

Page 17: Overview of Back Propagation Algorithm Shuiwang Ji

Gender and Ethnicity Recognition

Page 18: Overview of Back Propagation Algorithm Shuiwang Ji

First-layer Features

Page 19: Overview of Back Propagation Algorithm Shuiwang Ji

Convergence Rate