80
Deep Learning for Vision: Tricks of the Trade Marc'Aurelio Ranzato Facebook, AI Group BAVM Friday, 4 October 2013 www.cs.toronto.edu/~ranzato

Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

Embed Size (px)

Citation preview

Page 1: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

Deep Learning for Vision:

Tricks of the Trade

Marc'Aurelio Ranzato

Facebook, AI Group

BAVM Friday, 4 October 2013www.cs.toronto.edu/~ranzato

Page 2: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

2

Ideal Features

Ideal

Feature

Extractor

- window, right- chair, left- monitor, top of shelf- carpet, bottom- drums, corner- …

- pillows on couch

Q.: What objects are in the image? Where is the lamp? What is on the couch? ...

Ranzato

Page 3: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

3

The Manifold of Natural Images

Page 4: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

4

Ideal Feature Extraction

Pixel 1

Pixel 2

Pixel n

Expression

Pose

Ideal

Feature

Extractor

Ranzato

Page 5: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

5

Learning Non-Linear Features

Proposal #1: linear combination

Proposal #2: composition

Given a dictionary of simple non-linear functions: g1, , g n

f x≈∑ jg j

f x≈g1g

2 gn x

+

Ranzato

Page 6: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

6

Learning Non-Linear Features

Proposal #1: linear combination

Proposal #2: composition

Given a dictionary of simple non-linear functions: g1, , g n

f x≈g1g

2 gn x

Kernel learning Boosting ...

Deep learning Scattering networks (wavelet cascade) S.C. Zhou & D. Mumford “grammar”

S h a l l o w

D e e p

f x≈∑ jg j

Ranzato

Page 7: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

7

Linear Combination

+

...

Input image

templete matchers

prediction of class

BAD: it may require

an exponential nr. of

templates!!!Ranzato

Page 8: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

8

Composition

Input image

low level parts

prediction of class

GOOD: (exponentially)

more efficient

mid-level parts

high-level parts

reuse intermediate parts distributed representations

Lee et al. “Convolutional DBN's ...” ICML 2009, Zeiler & Fergus Ranzato

...

Page 9: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

9

A Potential Problem with Deep Learning

Optimization is difficult: non-convex, non-linear system

1

2

3

4

Ranzato

Page 10: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

10

A Potential Problem with Deep Learning

Optimization is difficult: non-convex, non-linear system

4

Solution #1: freeze first N-1 layer (engineer the features) It makes it shallow! - How to design features of features?

- How to design features for new imagery?S

IFT

k-M

ean

s

Po

ol in

g

Cla

ss

ifier

Ranzato

Page 11: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

11

A Potential Problem with Deep Learning

Optimization is difficult: non-convex, non-linear system

4

Solution #2: live with it! It will converge to a local minimum. It is much more powerful!!

1

2

3

Given lots of data, engineer less and learn more!!

Just need to know a few tricks of the trade... Ranzato

Page 12: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

12

Deep Learning in Practice

Optimization is easy, need to know a few tricks of the trade.

4

Q: What's the feature extractor? And what's the classifier?

1

2

3

A: No distinction, end-to-end learning!

Ranzato

Page 13: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

13

Deep Learning in Practice

It works very well in practice:

Ranzato

Page 14: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

14

KEY IDEAS: WHY DEEP LEARNING

We need non-linear system

We need to learn it from data

Build feature hierarchiesDistributed representationsCompositionality

End-to-end learning

Ranzato

Page 15: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

15

What Is Deep Learning?

Ranzato

Page 16: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

16

Buzz Words

It's a Contrastive Divergence

It's a Convolutional Net

It's just old Neural Nets

It's a Feature Learning

It's a Deep Belief Net

It's a Unsupervised Learning

Ranzato

Page 17: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

17

(My) Definition

A Deep Learning method is: a method which makes predictions by using a sequence of non-linear processing stages. The resulting intermediate representations can be interpreted as feature hierarchies and the whole system is jointly learned from data.

Some deep learning methods are probabilistic, others are loss-based, some are supervised, other unsupervised...

It's a large family!

Ranzato

Page 18: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

18

Perceptron1957

Rosenblatt

THE SPACE OF MACHINE LEARNING METHODS

Page 19: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

19

PerceptronNeural Net

AutoencoderNeural Net

80s

back-propagation &

compute power

Page 20: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

20

PerceptronNeural Net

AutoencoderNeural Net

90s

LeCun's CNNs

Convolutional Neural Net

Recurrent Neural Net

Sparse Coding

GMM

Page 21: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

21

Perceptron

AutoencoderNeural Net

Convolutional Neural Net

Recurrent Neural Net

Sparse Coding

SVM

Boosting

GMM

Restricted BM

Neural Net

00s

SVMs

Page 22: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

22

Perceptron

Boosting

SVM

GMM

BayesNP

Recurrent Neural Net

AutoencoderNeural Net

Sparse Coding

Restricted BM

Neural Net

Convolutional Neural Net

Deep Belief Net

2006

Hinton's DBN

Page 23: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

23

GMM

2009

BayesNP

Sparse Coding

Restricted BM

Neural Net

Deep Belief Net

Recurrent Neural Net

Boosting

Perceptron

AutoencoderNeural Net

Convolutional Neural Net

SVM

Deep (sparse/denoising) Autoencoder

ΣΠ

2009

ASR (data + GPU)

Page 24: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

24

GMM

BayesNP

Sparse Coding

Restricted BM

Neural Net

Deep Belief Net

Recurrent Neural Net

Boosting

Perceptron

AutoencoderNeural Net

Convolutional Neural Net

SVM

Deep (sparse/denoising) Autoencoder

ΣΠ

2012

CNNs (data + GPU)

Page 25: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

25

PerceptronNeural Net

Boosting

SVM

GMMΣΠ

BayesNP

Convolutional Neural Net

Recurrent Neural Net

AutoencoderNeural Net

Sparse Coding

Restricted BMDeep Belief Net

Deep (sparse/denoising) Autoencoder

Page 26: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

26

TIM

E

Convolutional Neural Net 2012

Convolutional Neural Net 1998

Convolutional Neural Net 1988

Q.: Did we make any prgress since then?

A.: The main reason for the breakthrough is: data and GPU, but we have also made networks deeper and more non-linear.

Page 27: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

27

- Fukushima 1980: designed network with same basic structure but did not train by backpropagation.

- LeCun from late 80s: figured out backpropagation for CNN, popularized and deployed CNN for OCR applications and others.

- Poggio from 1999: same basic structure but learning is restricted

to top layer (k-means at second stage)

- LeCun from 2006: unsupervised feature learning

- DiCarlo from 2008: large scale experiments, normalization layer

- LeCun from 2009: harsher non-linearities, normalization layer, learning unsupervised and supervised.

- Mallat from 2011: provides a theory behind the architecture

- Hinton 2012: use bigger nets, GPUs, more dataLeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998

ConvNets: History

Page 28: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

28

ConvNets: till 2012

Loss

parameter

Common wisdom: training does not work because we “get stuck in local minima”

Page 29: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

29

ConvNets: today

Loss

parameter

Local minima are all similar, there are long plateaus, it can take long time to break symmetries.

w w

input/output invariant to permutations

breaking ties between parameters

WTX

1

Saturating units

Page 30: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

30

Like walking on a ridge between valleys

Page 31: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

31

ConvNets: today

Loss

parameter

Local minima are all similar, there are long plateaus, it can take long to break symmetries.

Optimization is not the real problem when:– dataset is large– unit do not saturate too much– normalization layer

Page 32: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

32

ConvNets: today

Loss

parameter

Today's belief is that the challenge is about:– generalization How many training samples to fit 1B parameters? How many parameters/samples to model spaces with 1M dim.?

– scalability

Page 33: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

33

PerceptronNeural Net

Boosting

SVM

GMMΣΠ

BayesNP

Convolutional Neural Net

Recurrent Neural Net

AutoencoderNeural Net

Sparse Coding

Restricted BMDeep Belief Net

Deep (sparse/denoising) Autoencoder

SH

AL

LO

W

DE

EP

Page 34: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

34

PerceptronNeural Net

Boosting

SVM

GMMΣΠ

BayesNP

Convolutional Neural Net

Recurrent Neural Net

AutoencoderNeural Net

Sparse Coding

Restricted BMDeep Belief Net

Deep (sparse/denoising) Autoencoder

UNSUPERVISED

SUPERVISED

DE

EP

SH

AL

LO

W

Page 35: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

35

Deep Learning is a very rich family!I am going to focus on a few methods...

Ranzato

Page 36: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

36

PerceptronNeural Net

Boosting

SVM

GMMΣΠ

BayesNP

CNN

Recurrent Neural Net

AutoencoderNeural Net

Sparse Coding

Restricted BMDBN

Deep (sparse/denoising) Autoencoder

UNSUPERVISED

SUPERVISED

DE

EP

SH

AL

LO

W

Page 37: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

37

Deep Gated MRF

Layer 1:

E x ,hc, h

m=

1

2x '

−1

x

pair-wise MRFx p xq

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

p x ,hc , hm e−E x , hc,h

m

Page 38: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

38

Deep Gated MRF

Layer 1:

E x ,hc, h

m=

1

2x ' C C ' x

pair-wise MRFx p xq

F

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Page 39: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

39

Deep Gated MRF

Layer 1:

E x ,hc, h

m=

1

2x ' C [diag h

c]C ' x

gated MRF

x p xq

hkc

CCF

F

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Page 40: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

40

Deep Gated MRF

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Layer 1:

E x ,hc, h

m=

1

2x ' C [diag h

c]C ' x

1

2x ' x− x ' W h

m

x p xq

h jm

W

CCF

M

hkc

N

gated MRF

p x ∫hc∫

hm e

−E x ,hc, h

m

Page 41: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

41

Deep Gated MRF

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Layer 1:

E x ,hc, h

m=

1

2x ' C [diag h

c]C ' x

1

2x ' x− x ' W h

m

Inference of latent variables: just a forward pass

Training:

requires approximations(here we used MCMC methods)

p x ∫hc∫

hm e

−E x ,hc, h

m

Page 42: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

42

Deep Gated MRF

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Layer 1

Layer 2

input

h2

Page 43: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

43

Deep Gated MRF

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Layer 1

h2

Layer 2

input

h3

Layer 3

Page 44: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

44

Gaussian model marginal wavelet

from Simoncelli 2005

Pair-wise MRF FoE

from Schmidt, Gao, Roth CVPR 2010

Sampling High-Resolution Images

Page 45: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

45

Gaussian model marginal wavelet

from Simoncelli 2005

Pair-wise MRF FoE

from Schmidt, Gao, Roth CVPR 2010

Sampling High-Resolution Images

gMRF: 1 layer

Ranzato et al. PAMI 2013

Page 46: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

46

Gaussian model marginal wavelet

from Simoncelli 2005

Pair-wise MRF FoE

from Schmidt, Gao, Roth CVPR 2010

Sampling High-Resolution Images

gMRF: 1 layer

Ranzato et al. PAMI 2013

Page 47: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

47

Gaussian model marginal wavelet

from Simoncelli 2005

Pair-wise MRF FoE

from Schmidt, Gao, Roth CVPR 2010

Sampling High-Resolution Images

gMRF: 1 layer

Ranzato et al. PAMI 2013

Page 48: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

48

Gaussian model marginal wavelet

from Simoncelli 2005

Pair-wise MRF FoE

from Schmidt, Gao, Roth CVPR 2010

Sampling High-Resolution Images

gMRF: 3 layers

Ranzato et al. PAMI 2013

Page 49: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

49

Gaussian model marginal wavelet

from Simoncelli 2005

Pair-wise MRF FoE

from Schmidt, Gao, Roth CVPR 2010

Sampling High-Resolution Images

gMRF: 3 layers

Ranzato et al. PAMI 2013

Page 50: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

50

Gaussian model marginal wavelet

from Simoncelli 2005

Pair-wise MRF FoE

from Schmidt, Gao, Roth CVPR 2010

Sampling High-Resolution Images

gMRF: 3 layers

Ranzato et al. PAMI 2013

Page 51: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

51

Gaussian model marginal wavelet

from Simoncelli 2005

Pair-wise MRF FoE

from Schmidt, Gao, Roth CVPR 2010

Sampling High-Resolution Images

gMRF: 3 layers

Ranzato et al. PAMI 2013

Page 52: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

52

Sampling After Training on Face Images

Original Input 1st layer 2nd layer 3rd layer 4th layer 10 times

unconstrained samples

conditional (on the left part of the face) samples

Ranzato et al. PAMI 2013

Page 53: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

53

Expression Recognition Under Occlusion

Ranzato et al. PAMI 2013

Page 54: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

54

Pros Cons

Feature extraction is fast Unprecedented generation

quality Advances models of natural

images Trains without labeled data

Training is inefficientSlowTricky

Sampling scales badly with dimensionality What's the use case of

generative models?

Conclusion If generation is not required, other feature learning methods are

more efficient (e.g., sparse auto-encoders). What's the use case of generative models?

Page 55: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

55

PerceptronNeural Net

Boosting

SVM

GMMΣΠ

BayesNP

CNN

Recurrent Neural Net

AutoencoderNeural Net

SPARSE

CODING

Restricted BMDBN

Deep (sparse/denoising) Autoencoder

UNSUPERVISED

SUPERVISED

DE

EP

SH

AL

LO

W

Page 56: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

56

CONV NETS: TYPICAL ARCHITECTURE

Convol. LCN Pooling

One stage (zoom)

Fully Conn. Layers

Whole system

1st stage 2nd stage 3rd stage

Input Image

ClassLabels

Ranzato

Page 57: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

57

CONV NETS: TYPICAL ARCHITECTURE

Convol. LCN Pooling

One stage (zoom)

SIFT → K-Means → Pyramid Pooling → SVM

SIFT → Fisher Vect. → Pooling → SVM

Lazebnik et al. “...Spatial Pyramid Matching...” CVPR 2006

Sanchez et al. “Image classifcation with F.V.: Theory and practice” IJCV 2012

Conceptually similar to:

Ranzato

Page 58: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

58

CONV NETS: EXAMPLES

- OCR / House number & Traffic sign classification

Ciresan et al. “MCDNN for image classification” CVPR 2012Wan et al. “Regularization of neural networks using dropconnect” ICML 2013

Page 59: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

59

CONV NETS: EXAMPLES

- Texture classification

Sifre et al. “Rotation, scaling and deformation invariant scattering...” CVPR 2013

Page 60: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

60

CONV NETS: EXAMPLES

- Pedestrian detection

Sermanet et al. “Pedestrian detection with unsupervised multi-stage..” CVPR 2013

Page 61: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

61

CONV NETS: EXAMPLES

- Scene Parsing

Farabet et al. “Learning hierarchical features for scene labeling” PAMI 2013Ranzato

Page 62: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

62

CONV NETS: EXAMPLES

- Segmentation 3D volumetric images

Ciresan et al. “DNN segment neuronal membranes...” NIPS 2012Turaga et al. “Maximin learning of image segmentation” NIPS 2009 Ranzato

Page 63: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

63

CONV NETS: EXAMPLES

- Action recognition from videos

Taylor et al. “Convolutional learning of spatio-temporal features” ECCV 2010

Page 64: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

64

CONV NETS: EXAMPLES

- Robotics

Sermanet et al. “Mapping and planning ...with long range perception” IROS 2008

Page 65: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

65

CONV NETS: EXAMPLES

- Denoising

Burger et al. “Can plain NNs compete with BM3D?” CVPR 2012

original noised denoised

Ranzato

Page 66: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

66

CONV NETS: EXAMPLES

- Dimensionality reduction / learning embeddings

Hadsell et al. “Dimensionality reduction by learning an invariant mapping” CVPR 2006

Page 67: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

67

CONV NETS: EXAMPLES

- Image classification

Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012

Object

Recognizerrailcar

Ranzato

Page 68: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

68

CONV NETS: EXAMPLES

- Deployed in commercial systems (Google & Baidu, spring 2013)

Ranzato

Page 69: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

69

How To Use ConvNets...(properly)

Page 70: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

70

CHOOSING THE ARCHITECTURE

Task dependent

Cross-validation

[Convolution → LCN → pooling]* + fully connected layer

The more data: the more layers and the more kernelsLook at the number of parameters at each layerLook at the number of flops at each layer

Computational cost

Be creative :)Ranzato

Page 71: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

71

HOW TO OPTIMIZE

SGD (with momentum) usually works very well

Pick learning rate by running on a subset of the dataBottou “Stochastic Gradient Tricks” Neural Networks 2012Start with large learning rate and divide by 2 until loss does not divergeDecay learning rate by a factor of ~100 or more by the end of training

Use non-linearity

Initialize parameters so that each feature across layers has similar variance. Avoid units in saturation.

Ranzato

Page 72: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

72

HOW TO IMPROVE GENERALIZATION

Weight sharing (greatly reduce the number of parameters)

Data augmentation (e.g., jittering, noise injection, etc.)

Dropout Hinton et al. “Improving Nns by preventing co-adaptation of feature detectors” arxiv 2012

Weight decay (L2, L1)

Sparsity in the hidden units

Multi-task (unsupervised learning)

Ranzato

Page 73: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

73

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated) and have high variance.

sam

ple

s

hidden unit

Good training: hidden units are sparse across samples and across features. Ranzato

Page 74: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

74

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated) and have high variance.

sam

ple

s

hidden unit

Bad training: many hidden units ignore the input and/or exhibit strong correlations. Ranzato

Page 75: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

75

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated) and have high variance.

Visualize parameters

Good training: learned filters exhibit structure and are uncorrelated.

GOOD BADBAD BAD

too noisy too correlated lack structure

Ranzato

Page 76: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

76

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated) and have high variance.

Visualize parameters

Measure error on both training and validation set.

Test on a small subset of the data and check the error → 0.

Ranzato

Page 77: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

77

WHAT IF IT DOES NOT WORK?

Training diverges:Learning rate may be too large → decrease learning rateBPROP is buggy → numerical gradient checking

Parameters collapse / loss is minimized but accuracy is low Check loss function:

Is it appropriate for the task you want to solve?Does it have degenerate solutions?

Network is underperformingCompute flops and nr. params. → if too small, make net largerVisualize hidden units/params → fix optmization

Network is too slowCompute flops and nr. params. → GPU,distrib. framework, make net smaller

Ranzato

Page 78: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

78

SUMMARY Deep Learning = Learning Hierarchical representations. Leverage compositionality to gain efficiency.

Unsupervised learning: active research topic.

Supervised learning: most successful set up today.

OptimizationDon't we get stuck in local minima? No, they are all the same!In large scale applications, local minima are even less of an issue.

ScalingGPUsDistributed framework (Google)Better optimization techniques

Generalization on small datasets (curse of dimensionality):Input distortions weight decay dropout

Ranzato

Page 79: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

79

THANK YOU!

Ranzato

NOTE: IJCV Special Issue on Deep Learning.

Deadline: 9 Feb. 2014.

Page 80: Deep Learning for Vision: Tricks of the Trade - Splash · PDF fileDeep Learning for Vision: Tricks of the Trade ... Some deep learning methods are probabilistic, ... Neural Net Sparse

80

SOFTWARETorch7: learning library that supports neural net training

http://www.torch.chhttp://code.cogbits.com/wiki/doku.php (tutorial with demos by C. Farabet)

Python-based learning library (U. Montreal)

- http://deeplearning.net/software/theano/ (does automatic differentiation)

C++ code for ConvNets (Sermanet)

– http://eblearn.sourceforge.net/

Efficient CUDA kernels for ConvNets (Krizhevsky)

– code.google.com/p/cuda-convnet

Ranzato

More references at the CVPR 2013 tutorial on deep learning:

http://www.cs.toronto.edu/~ranzato/publications/ranzato_cvpr13.pdf