52
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori CS @ Simon Fraser University 27 Nov 2009 1

Convolutional Restricted Boltzmann Machines for Feature Learning

  • Upload
    adonis

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Convolutional Restricted Boltzmann Machines for Feature Learning. Mohammad Norouzi Advisor: Dr. Greg Mori CS @ Simon Fraser University 27 Nov 2009. CRBMs for Feature Learning. Mohammad Norouzi Advisor: Dr. Greg Mori CS @ Simon Fraser University 27 Nov 2009. Problems. Human detection. - PowerPoint PPT Presentation

Citation preview

Page 1: Convolutional  Restricted Boltzmann Machines for Feature Learning

Convolutional RestrictedBoltzmann Machines for

Feature Learning

Mohammad NorouziAdvisor: Dr. Greg Mori

CS @ Simon Fraser University27 Nov 2009

1

Page 2: Convolutional  Restricted Boltzmann Machines for Feature Learning

CRBMs forFeature Learning

Mohammad NorouziAdvisor: Dr. Greg Mori

CS @ Simon Fraser University27 Nov 2009

2

Page 3: Convolutional  Restricted Boltzmann Machines for Feature Learning

Problems

Human detectionHandwritten digit classification

3

Page 4: Convolutional  Restricted Boltzmann Machines for Feature Learning

Sliding Window Approach

4

Page 5: Convolutional  Restricted Boltzmann Machines for Feature Learning

Sliding Window Approach (Cont’d)

5

[INRIA Person Dataset]

Decisi

on B

ound

ary

Page 6: Convolutional  Restricted Boltzmann Machines for Feature Learning

Success or Failure of an object recognition algorithm hinges on the features used

Input Feature representation Label

Our Focus Classifier? HumanBackground

0 / 1 / 2 / 3 / …

6

Learning

Page 7: Convolutional  Restricted Boltzmann Machines for Feature Learning

Local Feature Detector Hierarchies

7

Larger More complicated Less frequent

Page 8: Convolutional  Restricted Boltzmann Machines for Feature Learning

Generative & Layerwise Learning

8

?

?

?

?

?

?

??

?

?

?

?

?

?

?

?Generative

CRBM

?

?

? ?

?

??

?

? ?

?

?

Page 9: Convolutional  Restricted Boltzmann Machines for Feature Learning

Visual Features: Filtering

9

1 0 -1

2 0 -2

1 0 -1Filter Kernel (Feature)

-1 0 1

-2 0 2

-1 0 1

0 -1 -2

1 0 -1

2 1 0Filter Response

1W

V

2W 2W

),( 1WVFilter ),( 2WVFilter ),( 3WVFilter

Page 10: Convolutional  Restricted Boltzmann Machines for Feature Learning

Our approach to feature learningis generative

?

?

?

1H

2H

3H

V

Binary HiddenVariables

10

1W

2W

3W

(CRBM model)

Page 11: Convolutional  Restricted Boltzmann Machines for Feature Learning

Related Work

11

Page 12: Convolutional  Restricted Boltzmann Machines for Feature Learning

Related Work

• Convolutional Neural Network (CNN)– Filtering layers are bundled with a classifier, and all

the layers are learned together using error backpropagation.

– Does not perform well on natural images

• Biologically plausible models– Hand-crafted first layer vs. Randomly selected

prototypes for second layer.

[Lecun et al. 98]

[Ranzato et al. CVPR'07]

[Serre et al., PAMI'07] [Mutch and Lowe, CVPR'06]

12

Discrim

inative

No Learning

Page 13: Convolutional  Restricted Boltzmann Machines for Feature Learning

Related Work (cont’d)

• Deep Belief Net– A two layer partially observed MRF, called RBM, is

the building block– Learning is performed unsupervised and layer-by-

layer from bottom layer upwards

• Our contributions: We incorporate spatial locality into RBMs and adapt the learning algorithm accordingly

• We add more complicated components such as pooling and sparsity into deep belief nets

[Hinton et al., NC'2006]

13

Generative &

Unsupervi

sed

Page 14: Convolutional  Restricted Boltzmann Machines for Feature Learning

Why Generative &Unsupervised

• Discriminative learning of deep and large neural networks has not been successful– Requires large training sets– Easily gets over-fitted for large models– First layer gradients are relatively small

• Alternative hybrid approach– Learn a large set of first layer features generatively– Switch to a discriminative model to select the

discriminative features from those that are learned– Discriminative fine-tuning is helpful

Page 15: Convolutional  Restricted Boltzmann Machines for Feature Learning

Details

15

Page 16: Convolutional  Restricted Boltzmann Machines for Feature Learning

CRBM

• Image is the visible layer and hidden layer is related to filter responses

• An energy based probabilistic model

16Dot product of vectorized matrices

),();,(

);,();,(

,exp1

kkkk

k kk

H

WVFilterHWHVE

WHVEWHVE

H;WVEZ

=V;WP

Page 17: Convolutional  Restricted Boltzmann Machines for Feature Learning

Training CRBMs

• Maximum likelihood learning of CRBMs is difficult• Contrastive Divergence (CD) learning is applicable

• For CD learning we need to compute the conditionals and .

data

17

sample

HVP | VHP |

Page 18: Convolutional  Restricted Boltzmann Machines for Feature Learning

CRBM (Backward)

• Nearby hidden variablescooperate in reconstruction

• Conditional Probabilities take the form

18

)exp1(

1

*

)(

),()|(

),()|(

x

k kk

kk

x

WHFilterHVP

WVFilterVHP

Page 19: Convolutional  Restricted Boltzmann Machines for Feature Learning

Learning the Hierarchy

• The structure is trained bottom up and layerwise• The CRBM model for training filtering layers • Filtering layers are followed by down-sampling

CRBM CRBMClassifier

Pooling Pooling

19FilteringNon-linearity

Reduce thedimensionality

layers

Page 20: Convolutional  Restricted Boltzmann Machines for Feature Learning

Input

1st Filters 2nd Filters

ResponsesResponses

1 32 4

Page 21: Convolutional  Restricted Boltzmann Machines for Feature Learning

Experiments

21

Page 22: Convolutional  Restricted Boltzmann Machines for Feature Learning

Evaluation

MNIST digit dataset• Training set: 60,000 image

of digits of size 28x28• Test set: 10,000 images

INRIA person dataset• Training set: 2416 person

windows of size 128 x 64 pixels and 4.5x106 negative windows

• Test set: 1132 positive and 2x106 negative windows

22

Page 23: Convolutional  Restricted Boltzmann Machines for Feature Learning

First layer filters

• Gray-scale images of INRIA positive set

• 15 filters of 7x7

23

• MNIST unlabeled digits• 15 filters of 5x5

Page 24: Convolutional  Restricted Boltzmann Machines for Feature Learning

Second Layer Features (MNIST)• Hard to visualize the filters• We show patches highly responded to filters:

2424

Page 25: Convolutional  Restricted Boltzmann Machines for Feature Learning

Second Layer Features (INRIA)

25

Page 26: Convolutional  Restricted Boltzmann Machines for Feature Learning

MNIST Results

• MNIST error rate when model is trained on the full training set

26

Page 27: Convolutional  Restricted Boltzmann Machines for Feature Learning

Results

27

False Positive

Page 28: Convolutional  Restricted Boltzmann Machines for Feature Learning

1st

28

Page 29: Convolutional  Restricted Boltzmann Machines for Feature Learning

2nd

29

Page 30: Convolutional  Restricted Boltzmann Machines for Feature Learning

3rd

30

Page 31: Convolutional  Restricted Boltzmann Machines for Feature Learning

4th

31

Page 32: Convolutional  Restricted Boltzmann Machines for Feature Learning

5th

32

Page 33: Convolutional  Restricted Boltzmann Machines for Feature Learning

INRIA Results

• Adding our large-scale features significantly improves performance of the baseline (HOG)

33

Page 34: Convolutional  Restricted Boltzmann Machines for Feature Learning

Conclusion

• We extended the RBM model to Convolutional RBM, useful for domains with spatial locality

• We exploited CRBMs to train local hierarchical feature detectors one layer at a time and generatively

• This method obtained results comparable to state-of-the-art in digit classification and human detection

34

Page 35: Convolutional  Restricted Boltzmann Machines for Feature Learning

Thank You

35

Page 36: Convolutional  Restricted Boltzmann Machines for Feature Learning

Hierarchical Feature Detector

36

? ? ?

? ? ?

? ? ?

Page 37: Convolutional  Restricted Boltzmann Machines for Feature Learning

Contrastive Divergence Learning

37

data

1kdata

0kkk H,VFilterH,VFilterη+W=W )()( 10

kk

HV,Filter=W

θH;V,E

Page 38: Convolutional  Restricted Boltzmann Machines for Feature Learning

Training CRBMs (Cont'd)

• The problem of reconstructing border region becomes severe when number of Gibbs sampling steps > 1.– Partition visible units into middle and border

regions

• Instead of maximizing thelikelihood, we (approximately)maximize bm v|vp

Page 39: Convolutional  Restricted Boltzmann Machines for Feature Learning

Enforcing Feature Sparsity

• The CRBM's representation is K (number of filters) times overcomplete

• After a few CD learning iterations, V is perfectly reconstructed

• Enforce sparsity to tackle this problem– Hidden bias terms were frozen at large negative values

• Having a single non-sparse hidden unit improves the learned features– Might be related to the ergodicity condition

Page 40: Convolutional  Restricted Boltzmann Machines for Feature Learning

Probabilistic Meaning of Max

1 2 3 4 5 6

1 2 3 4

Max

1 2 3 4 5 6

1 1 2 2h

h'

v

6453

4231

:T

4:T

3

:T

2:T

1

vwh+vwh+

vwh+vwh=hv,E

h'

v

6453

4231

:T

2:T

2

:T

1:T

1

vwh'+vwh'max+

vwh',vwh'max=hv,E

Page 41: Convolutional  Restricted Boltzmann Machines for Feature Learning

The Classifier Layer

• We used SVM as our final classifier– RBF kernel for MNIST– Linear kernel for INRIA– For INRIA we combined our 4th layer outputs and HOG

features

• We experimentally observed that relaxing the sparsity of CRBM's hidden units yields better results– This lets the discriminative model to set the thresholds

itself

Page 42: Convolutional  Restricted Boltzmann Machines for Feature Learning

Why HOG features are added?

• Because part-like features are very sparse

• Having a template of the human figure helps a lot

f

Page 43: Convolutional  Restricted Boltzmann Machines for Feature Learning

RBM

• Two layer pairwise MRF with a full setof hidden-visible connections

• RBM Is an energy based model

• Hidden random variables are binary, Visible variables can be binary or continuous

• Inference is straightforward: and• Contrastive Divergence learning for training

h

v

w

θh;v,EθZ

=θh;v,p exp1

22

1ijjiijiji v+hcvbhwv=θh;v,E

v|hp h|vp

Page 44: Convolutional  Restricted Boltzmann Machines for Feature Learning

Why Unsupervised Bottom-Up

• Discriminative learning of deep structure has not been successful– Requires large training sets– Easily is over-fitted for large models– First layer gradients are relatively small

• Alternative hybrid approach– Learn a large set of first layer features generatively– Later, switch to a discriminative model to select the

discriminative features from those learned– Fine-tune the features using

Page 45: Convolutional  Restricted Boltzmann Machines for Feature Learning

INRIA Results (Cont'd)

• Missrate at different FPPW rates

• FPPI is a better indicator of performance• More experiments on size of features and

number of layers are desired

Page 46: Convolutional  Restricted Boltzmann Machines for Feature Learning
Page 47: Convolutional  Restricted Boltzmann Machines for Feature Learning
Page 48: Convolutional  Restricted Boltzmann Machines for Feature Learning
Page 49: Convolutional  Restricted Boltzmann Machines for Feature Learning
Page 50: Convolutional  Restricted Boltzmann Machines for Feature Learning
Page 51: Convolutional  Restricted Boltzmann Machines for Feature Learning
Page 52: Convolutional  Restricted Boltzmann Machines for Feature Learning