108
1 Deep Learning Jiseob Kim ( [email protected] ) Artificial Intelligence Class of 2016 Spring Dept. of Computer Science and Engineering Seoul National University

Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

  • Upload
    lynhu

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

1

Deep Learning

Jiseob Kim ( [email protected])

Artificial Intelligence Class of 2016 SpringDept. of Computer Science and Engineering

Seoul National University

Page 2: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Neural networkBack propagation

1986

Solve general learning problems Tied with biological systemBut it is given up…

• Hard to train

• Insufficient computational resources

• Small training sets

• Does not work well

2006

• SVM

• Boosting

• Decision tree

• KNN

• …

• Loose tie with biological systems

• Shallow model

• Specific methods for specific tasks

– Hand crafted features (GMM-HMM, SIFT, LBP, HOG)

Kruger et al. TPAMI’13

Deep belief netScience

… …

… …

… …

… …• Unsupervised & Layer-wised pre-training

• Better designs for modeling and training (normalization, nonlinearity, dropout)

• Feature learning

• New development of computer architectures– GPU

– Multi-core computer systems

• Large scale databases

deep learning results

Speech

2011 2012

How Many Computers to Identify a Cat? 16000 CPU cores

Rank Name Error rate

Description

1 U. Toronto 0.15315 Deep Conv Net

2 U. Tokyo 0.26172 Hand-crafted features and learning models.Bottleneck.

3 U. Oxford 0.26979

4 Xerox/INRIA 0.27058

Object recognition over 1,000,000 images and 1,000 categories(2 GPU)

History of Neural Network Research

Slides from Wanli Ouyang [email protected]

Page 3: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Contents

Neural Networks

Multilayer Perceptron Structure

Learning Algorithm based on Back Propagation

Deep Belief Network

Restricted Boltzmann Machines

Deep Learning (Deep Belief Network)

Convolutional Neural Networks (CNN)

CNN Structure and Learning

Applications

Page 4: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

NEURAL NETWORKS

Slides by Jiseob [email protected]

Page 5: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Classification Problem

Problem of finding label Y given data X ex1) x: face image, y: person’s name

ex2) x: blood sugar measurement, blood pressure…|y: diagnosis of diabetes

ex3) x: voice, y: sentence corresponding to the voice

x: D-dimensional vector, y: integer (Discrete)

Famous pattern recognition algorithms Support Vector Machine

Decision Tree

K-Nearest Neighbor

Multi-Layer Perceptron (Artificial Neural Network; 인공신경망)

Page 6: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Perceptron (1/2)

Page 7: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Perceptron (2/2)

x1 x2

1w1w2

b

x1

x2w1*x1 + w2*x2 +b = 0

> 0: < 0:

Page 8: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Parameter Learning in Perceptron

start: The weight vector w is generated randomlytest: A vector x ∈ P ∪ N is selected randomly, If x∈P and w·x>0 goto test, If x∈P and w·x≤0 goto add,If x ∈ N and w · x < 0 go to test,If x ∈ N and w · x ≥ 0 go to subtract. add: Set w = w+x, goto testsubtract:Set w = w-x, goto test

Page 9: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Sigmoid Unit

ClassicPerceptron

SigmoidUnitSigmoid function is

Differentiable

¶s (x)

¶x=s (x)(1-s (x))

Page 10: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Learning Algorithm of Sigmoid Unit

Loss Function

Gradient Descent Update

f (s) =1/ (1+ e-s )

f '(s) = f (s)(1- f (s))

XXW

)1()(2)(2 fffds

ffd

XWW )1()( fffdc

TargetUnit

Output

2)( fd

Page 11: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Need for Multiple Units and Multiple Layers

Multiple boundaries are needed (e.g. XOR problem) Multiple Units

More complex regions are needed (e.g. Polygons) Multiple Layers

Page 12: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Structure of Multilayer Perceptron (MLP; Artificial Neural Network)

Input Output

Page 13: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Learning Parameters of MLP

Loss Function

We have the same Loss Function

But the # of parameters are now much more (Weight for each layer and each unit)

To use Gradient Descent, we need to calculate the gradient for all the parameters

TargetUnit

Output

2)( fd

Recursive Computation of Gradients

Computation of loss-gradient of the top-layer weights is the same as before

Using the chain rule, we can compute the loss-gradient of lower-layer weights recursively (Back Propagation)

Page 14: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Back Propagation Learning Algorithm (1/3)

Gradients of top-layer weights and update rule

Store intermediate value delta for later use of chain rule

d (k ) =¶e

¶si( j )

= (d - f )¶f

¶si( j )

= (d - f ) f (1- f )

2)( fd

¶e

¶W= -2(d - f )

¶f

¶sX = -2(d - f ) f (1- f )X

XWW )1()( fffdc Gradient Descent update rule

Page 15: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Back Propagation Learning Algorithm (2/3)

Gradients of lower-layer weights

)()1()( j

i

jj

is WX

)()(

2

)()(2

)(j

i

j

i

j

i s

ffd

s

fd

s

)1()()1(

)(

)1(

)()(

)(

)()(

2)(2

jj

i

j

j

i

j

j

i

j

i

j

i

j

i

j

i

s

ffd

s

s

s

XX

XWW

Weighted sum

Local gradient

)1()()()()( jj

i

j

i

j

i

j

i c XWW

Gradient Descent Update rule for lower-layer weights

Page 16: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Back Propagation Learning Algorithm (3/3)

Applying chain rule, recursive relation between delta’s

1

1

)1()1()()()( )1(jm

l

j

il

j

i

j

i

j

i

j

i wff

Algorithm: Back Propagation

1. Randomly Initialize weight parameters2. Calculate the activations of all units (with input data)3. Calculate top-layer delta4. Back-propagate delta from top to the bottom5. Calculate actual gradient of all units using delta’s6. Update weights using Gradient Descent rule7. Repeat 2~6 until converge

Page 17: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Applications

Almost All Classification Problems

Face Recognition

Object Recognition

Voice Recognition

Spam mail Detection

Disease Detection

etc.

Page 18: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Limitations and Breakthrough

Limitations Back Propagation barely changes lower-layer parameters (Van

ishing Gradient)

Therefore, Deep Networks cannot be fully (effectively) trained with Back Propagation

Breakthrough Deep Belief Networks (Unsupervised Pre-training)

Convolutional Neural Networks (Reducing Redundant Parameters)

Rectified Linear Unit (Constant Gradient Propagation)

Inputx

Outputy'

TargetyErrorErrorError

Back-propagation

Page 19: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

DEEP BELIEF NETWORKS

Slides by Jiseob [email protected]

Page 20: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Motivation

Idea:

Greedy Layer-wise training

Pre-training + Fine tuning

Contrastive Divergence

Page 21: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Restricted Boltzmann Machine (RBM)

j

j

i

i

xPP

hPP

)|()|(

)|()|(

hhx

xxh

Energy-Based Model

Energy function

E(x,h)=b' x+c' h+h' Wx

hx

hx

hx

hx

,

),(

),(

),(E

E

e

eP

x1 x2

h2 h3 h4

x3

h5h1

P(x) =

e-E (x,h)

h

å

e-E (x,h)

x,h

å

P(xj = 1|h) = σ(bj +W’• j · h)

P(hi = 1|x) = σ(ci +Wi ·· x)

Joint (x, h)

Probability W

Marginal (x)

Probability,

or

LikelihoodRemark:• Conditional Independence

• Conditional Probability is the same as Neural Network

Conditional

Probability

Page 22: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Unsupervised Learning of RBM

Maximum Likelihood

Use Gradient Descent

L(X;q ) =

e-E (x,h)

h

å

e-E (x,h)

x,h

å

¶L(X;q )

¶wij= p(x,q )

¶ log f (x;q )

¶qò dx -

1

K

¶ log f (x(k);q )

¶qk=1

K

å

= xih j p(x,q )- xihj X

= xihj ¥- xihj 0

» xihj 1- xih j 0

Distribution of Model

Distribution of Dataset

0 jihv jihv

i

j

i

j

i

j

i

j

t = 0 t = 1 t = 2 t = infinity

a fantasy

Page 23: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Contrastive Divergence (CD) Learning of RBM parameters

k-Contrastive Divergence Trick

From the previous slide, to get distribution of model, we need to calculate many Gibbs sampling steps

And this is per a single parameter update

Therefore, we take the sample after only k-steps where in practice, k=1 is sufficient

0 jihv jihv

i

j

i

j

i

j

i

j

t = 0 t = 1 t = 2 t = infinity

a fantasy

Take this as a sample of Model distribution

Page 24: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Effect of Unsupervised Training

Unsupervised Training makes RBM successfully catch the essential patterns

RBM trained on MNIST hand-written digit data:

Each cell shows the pattern each hidden node encodes

Page 25: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Deep Belief Network (DBN)

Deep Belief Network (Deep Bayesian Network)

Bayesian Network that has similar structure to Neural Network

Generative model

Also, can be used as classifier (with additional classifier at top layer)

Resolves gradient vanishing by Pre-training

There are two modes (Classifier & Auto-Encoder), but we only consider Classifier here

Page 26: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Learning Algorithm of DBN

DBN as a stack of RBMs

1. Regard each layer as RBM

2. Layer-wise Pre-train each RBM in Unsupervised way

3. Attach the classifier and Fine-tune the whole Network in Supervised way

… …

… …

… …

… …

… …

… …

h1

x

h2

h1

h3

h2

… …

… …

h0

x0

W

RBM

DBN

Classifier

Page 27: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Viewing Learning as Wake-Sleep Algorithm

Page 28: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Effect of Unsupervised Pre-Training in DBN (1/2)

28

Erhan et. al. AISTATS’2009

Page 29: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Effect of Unsupervised Pre-Training in DBN (2/2)

29

with pre-trainingwithout pre-training

Page 30: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Internal Representation of DBN

30

Page 31: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Representation of Higher Layers

Higher layers have more abstract representations

Interpolating between different images is not desirable in lower layers, but natural in higher layers

Bengio et al., ICML 2013

Page 32: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Inference Algorithm of DBN

As DBN is a generative model, we can also regenerate the data

From the top layer to the bottom, conduct Gibbs sampling to generate the data samples

Genera

te d

ata

Occluded

Regenerated

Lee, Ng et al., ICML 2009

Page 33: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Applications

Nowadays, CNN outperforms DBN for Image or Speech data

However, if there is no topological information, DBN is still a good choice

Also, if the generative model is needed, DBN is used

Generate Face patchesTang, Srivastava, Salakhutdinov, NIPS 2014

Page 34: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

CONVOLUTIONAL NEURAL NETWORKS

Slides by Jiseob [email protected]

Page 35: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Motivation

Idea:

Fully connected structure has too many parameters to learn

Efficient to learn local patterns when there are geometrical, topological structure between features such as image data or voice data (spectrograms)

DBN: different data

CNN: same data

Image 1 Image 2

Page 36: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Structure of Convolutional Neural Network (CNN)

Higher features formed by repeated Convolution andPooling (Subsampling)

Convolution obtains certain Feature from local area

Pooling reduces Dimension, while obtaining Translation-invariant Feature

http://parse.ele.tue.nl/education/cluster2

Page 37: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Convolution Layer

The Kernel Detects pattern:

The Resulting value Indicates:

How much the pattern matches at each region

1 0 1

0 1 0

1 0 1

Page 38: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Max-Pooling Layer

The Pooling Layer summarizes the results of Convolution Layer

e.g.) 10x10 result is summarized into 1 cell

The Result of Pooling Layer is Translation-invariant

Page 39: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Remarks

Higher layer

Hig

her la

yer

• Higher layer catches more specific, abstract patterns

• Lower layer catches more general patterns

Page 40: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Parameter Learning of CNN

CNN is just another Neural Network with sparse connections

Learning Algorithm:

Back Propagation on Convolution Layers and Fully-Connected Layers

Back Propagation

Page 41: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Applications (Image Classification) (1/2)

Image Net Competition Ranking(1000-class, 1 million images)

From Kyunghyun Cho’s dnn tutorial

ALL CNN!!

Page 42: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Applications (Image Classification) (2/2)

Krizhevsky et al.: the winner of ImageNet 2012 Competition

1000-class problem,top-5 test error rate of 15.3%

Fully Connected

Page 43: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Application (Speech Recognition)

Input: Spectrogram of Speech

Convolutional Neural Network

CNN outperforms all previous methods that uses GMM of MFCC

Page 44: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

APPENDIX

Slides from Wanli Ouyang [email protected]

Page 45: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

45

Good learning resources

Webpages: Geoffrey E. Hinton’s readings (with source code available for DBN) http://ww

w.cs.toronto.edu/~hinton/csc2515/deeprefs.html Notes on Deep Belief Networks http://www.quantumg.net/dbns.php MLSS Tutorial, October 2010, ANU Canberra, Marcus Frean http://videolectur

es.net/mlss2010au_frean_deepbeliefnets/ Deep Learning Tutorials http://deeplearning.net/tutorial/ Hinton’s Tutorial, http://videolectures.net/mlss09uk_hinton_dbn/ Fergus’s Tutorial, http://cs.nyu.edu/~fergus/presentations/nips2013_final.pdf CUHK MMlab project : http://mmlab.ie.cuhk.edu.hk/project_deep_learning.h

tml People:

Geoffrey E. Hinton’s http://www.cs.toronto.edu/~hinton Andrew Ng http://www.cs.stanford.edu/people/ang/index.html Ruslan Salakhutdinov http://www.utstat.toronto.edu/~rsalakhu/ Yee-Whye Teh http://www.gatsby.ucl.ac.uk/~ywteh/ Yoshua Bengio www.iro.umontreal.ca/~bengioy Yann LeCun http://yann.lecun.com/ Marcus Frean http://ecs.victoria.ac.nz/Main/MarcusFrean Rob Fergus http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php

Acknowledgement Many materials in this ppt are from these papers, tutorials, etc (especially

Hinton and Frean’s). Sorry for not listing them in full detail.

Dumitru Erhan, Aaron Courville, Yoshua Bengio. Understanding Representations Learned in Deep Architectures. Technical Report.

Page 46: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

46

Graphical model for Statistics

Conditional independence between random variables

Given C, A and B are independent: P(A, B|C) = P(A|C)P(B|C)

P(A,B,C) =P(A, B|C) P(C) =P(A|C)P(B|C)P(C)

Any two nodes are conditionally independent given the values of their parents.

http://www.eecs.qmul.ac.uk/~norman/BBNs/Independence_and_conditional_independence.htm

Smoker?

Has Lung cancer Has bronchitis

B

C

A

Page 47: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

47

Directed and undirected graphical model

Directed graphical model P(A,B,C) = P(A|C)P(B|C)P(C) Any two nodes are conditionally indepe

ndent given the values oftheir parents.

Undirected graphical model P(A,B,C) = P(B,C)P(A,C) Also called Marcov Random Field (MRF)

B A

C

P(A,B,C,D) = P(D|A,B)P(B|C)P(A|C)P(C)

B A

C

D

B

C

A

B

C

A

Page 48: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

48

Modeling undirected model

Probability:

)(

);(

);(

);()(

Z

f

f

fP

x

x

xx;

x

B A

C

),(

)exp()exp(

)exp(

)exp(),,(

21

21

,,

21

21

wwZ

ACwBCw

ACwBCw

ACwBCwCBAP

CBA

;w1 w2

Example: P(A,B,C) = P(B,C)P(A,C) Is smoker?

Is healthy Has Lung cancer

partition

function

1)( x

x;P

Page 49: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

49

More directed and undirected models

D E

A B

G H

h1 h2

y1

h3

y2 y3

Hidden Marcov model MRF in 2D

F

C

I

Page 50: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

50

More directed and undirected models

A B

C

D

P(A,B,C,D)=P(A)P(B)P(C|B)P(D|A,B,C)

h1 h2

y1

h3

y2 y3

P(y1, y2, y3, h1, h2, h3)=P(h1)P(h2| h1)

P(h3| h2) P(y1| h1)P(y2| h2)P(y3| h3)

Page 51: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

51

More directed and undirected models

v

h ...

...x

h1

...

...h2

h3

...

...

W W 0

W 1

W 2

(a) (b)

HMM

RBM DBN(c)

...

...

...

...

h1

h2

h3

W 1

W 2

x

Our deep model

Page 52: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

52

Extended reading on graphical model

Zoubin Ghahramani ‘s video lecture on graphical models: http://videolectures.net/mlss07_ghahramani_grafm/

Page 53: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Product of Experts

53

,)(

);(

);(

);(

);();(

);(

Z

f

e

e

f

f

PE

E

m

m

mm

m

m

mmx

x

x

x

x

x

x

x

mxx );(log);( mmmfE

Energy function

Partition function

MRF in 2D

...);( 34321 CFwBEwADwBCwABwE wx

D E

A B

G H

F

C

I

Page 54: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Product of Experts

54

15

1

)()()1(

i

ii ce ii uxuxT

Page 55: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

55

Products of experts versus Mixture model

Products of experts : "and" operation Sharper than mixture Each expert can constrain a different subset of dimensions.

Mixture model, e.g. Gaussian Mixture model “or” operation a weighted sum of many density functions

xx

x

x);(

);(

);(m

m

mm

m

m

mm

f

f

P

Page 56: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

56

Outline

Basic background on statistical learning and Graphical model

Contrastive divergence and Restricted Boltzmann machine Product of experts Contrastive divergence Restricted Boltzmann Machine

Deep belief net

Page 57: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

57

Contrastive Divergence (CD)

Probability: Maximum Likelihood and gradient descent

)(/);()x;( ZxfP

Xx

xx

xx

xx

xX

);(log);(log

);(log1);(log),(

);(log1

)(log);(1

),(

1

1

ff

f

Kd

fp

fK

ZL

K

p

K

k

(k)

K

k

(k)

data dist.model dist. expectation

0);();(

1

XX Lor

Ltt

K

k

kK

k

k PLP11

)logmax);(max)max

;(xX;(x)()(

x)x;()( mfZ

Page 58: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

58

Contrastive Divergence (CD)

Gradient of Likelihood:

T

K

k

(k)f

Kd

fp

L

1

);x(log1x

);x(log),x(

);X(

Intractable

Tractable Gibbs Sampling Fast contrastive divergence T=1

Easy to compute

Sample p(z1,z2,…,zM)

);X(1

Ltt

CD

Minimum

P(A,B,C) = P(A|C)P(B|C)P(C)

B A

C

Accurate but slow gradient

Approximate but fast gradient

Page 59: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

59

Gibbs Sampling for graphical model

x1 x2

h2 h3 h4

x3

h5h1

More information on Gibbs sampling:

Pattern recognition and machine learning(PRML)

Page 60: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

60

Convergence of Contrastive divergence (CD)

The fixed points of ML are not fixed points of CD and vice versa. CD is a biased learning algorithm. But the bias is typically very small. CD can be used for getting close to ML solution and then ML le

arning can be used for fine-tuning.

It is not clear if CD learning converges (to a stable fixed point). At 2005, proof is not available.

Further theoretical results? Please inform us

M. A. Carreira-Perpignan and G. E. Hinton. On Contrastive Divergence Learning. Artificial Intelligence and Statistics, 2005

Page 61: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

61

Outline

Basic background on statistical learning and Graphical model

Contrastive divergence and Restricted Boltzmann machine Product of experts Contrastive divergence Restricted Boltzmann Machine

Deep belief net

Page 62: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

62

Boltzmann Machine

Undirected graphical model, with hidden nodes.

i

ii

ji

jiij xxxwE )(x;

,)(

);(

);(

);(

);();(

);(

Z

f

e

e

f

f

PE

E

m

m

mm

m

m

mmx

x

x

x

x

x

x

x

Boltzmann machine: E(x,h)=b' x+c' h+h' Wx+x’Ux+h’Vh

},{: iijw

Page 63: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

Restricted Boltzmann Machine (RBM)

Undirected, loopy, layer

E(x,h)=b' x+c' h+h' Wx

j

j

i

i

xPP

hPP

)|()|(

)|()|(

hhx

xxh

hx

hx

hx

hx

,

),(

),(

),(E

E

e

eP

x1 x2

h2 h3 h4

x3

h5h1

h

x

hx

hx

h

hx

x

,

),(

),(

)(E

E

e

e

P

P(xj = 1|h) = σ(bj +W’• j · h)

P(hi = 1|x) = σ(ci +Wi ·· x)

Boltzmann machine: E(x,h)=b' x+c' h+h' Wx+x’Ux+h’Vh

partition

function

W

Read the manuscript for details

Page 64: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

64

Restricted Boltzmann Machine (RBM)

E(x,h)=b' x+c' h+h' Wx

x = [x1 x2 …]T, h = [h1 h2 …]T

Parameter learning Maximum Log-Likelihood

K

k

kK

k

k PLP11

)logmin);(min)max

;(xX;(x)()(

)(

);()(

,

(

(

Z

xf

e

e

P

hx

Wx)h'h c'x b'

h

Wx)h'h c'x b'

x;

Geoffrey E. Hinton, “Training Products of Experts by Minimizing Contrastive Divergence.” Neural Computation 14, 1771–1800 (2002)

Page 65: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

65

CD for RBM

CD for RBM, very fast!

);X(1

Ltt

)(

);()x;(

h,x

Wx)h'h c' xb'(

h

Wx)h'h c' xb'(

Z

xf

e

e

P

0

0Xx

xx

xx

X

jiji

jijijipji

K

k

(k)

ij

hxhx

hxhxhxhx

f

Kd

fp

w

L

1

),(

1

);(log1);(log),(

);(

P(xj = 1|h) = σ(bj +W’• j · h) P(hi = 1|x) = σ(ci +Wi · x)

CD

Page 66: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

66

CD for RBM

h1

x1 x2

h2

0

Xjiji

ij

hxhxw

L

1

);(

P(xj = 1|h) = σ(bj +W’• j · h) P(hi = 1|x) = σ(ci +Wi · x)

P(xj = 1|h)

= σ(bj +W’• j · h)

P(hi = 1|x)

= σ(ci +Wi · x)

P(xj = 1|h)

= σ(bj +W’• j · h)

Page 67: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

67

RBM for classification

y: classification label

Hugo Larochelle and Yoshua Bengio, Classification using Discriminative Restricted Boltzmann Machines, ICML 2008.

Page 68: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

68

RBM itself has many applications

Multiclass classification Collaborative filtering Motion capture modeling Information retrieval Modeling natural images Segmentation

Y Li, D Tarlow, R Zemel, Exploring compositional high order pattern potentials for structured output learning, CVPR 2013V. Mnih, H Larochelle, GE Hinton , Conditional Restricted Boltzmann Machines for Structured Output Prediction, Uncertainty in Artificial Intelligence, 2011.Larochelle, H., & Bengio, Y. (2008). Classification using discriminative restricted boltzmann machines. ICML, 2008.Salakhutdinov, R., Mnih, A., & Hinton, G. E. (2007). Restricted Boltzmann machines for collaborative filtering. ICML 2007.Salakhutdinov, R., & Hinton, G. E. (2009). Replicated softmax: an undirected topic model., NIPS 2009.Osindero, S., & Hinton, G. E. (2008). Modeling image patches with a directed hierarchy of markov random field., NIPS 2008

Page 69: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

69

Outline

Basic background on statistical learning and Graphical model

Contrastive divergence and Restricted Boltzmann machine

Deep belief net (DBN)Why deep leaning? Learning and inference Applications

Page 70: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

70

Belief Nets

A belief net is a directed acyclic graph composed of random variables.

random

hidden

cause

visible

effect

Page 71: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

71

Deep Belief Net

Belief net that is deep A generative model

P(x,h1,…,hl) = p(x|h1) p(h1|h2)… p(hl-2|hl-1) p(hl-1,hl)

Used for unsupervised training of multi-layer deep model.

h1

x

h2

h3

… …

… …

… …

… …

P(x,h1,h2,h3) = p(x|h1) p(h1|h2) p(h2,h3)

Pixels=>edges=> local shapes=> object parts

Page 72: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

72

Why Deep learning?

The mammal brain is organized in a deep architecture with a given input percept represented at multiple levels of abstraction, each level corresponding to a different area of cortex.

An architecture with insufficient depth can require many more computational elements, potentially exponentially more (with respect to input size), than architectures whose depth is matched to the task.

Since the number of computational elements one can afford depends on the number of training examples available to tune or select them, the consequences are not just computational but also statistical: poor generalization may be expected when using an insufficiently deep architecture for representing some functions.

T. Serre, etc., “A quantitative theory of immediate visual recognition,” Progress in Brain Research, Computational Neuroscience: Theoretical Insights into Brain Function, vol. 165, pp. 33–56, 2007. Yoshua Bengio, “Learning Deep Architectures for AI,” Foundations and Trends in Machine Learning, 2009.

Pixels=>edges=> local shapes=> object parts

Page 73: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

73

Linear regression, logistic regression: depth 1 Kernel SVM: depth 2 Decision tree: depth 2 Boosting: depth 2 The basic conclusion that these results suggest is that whe

n a function can be compactly represented by a deep architecture, it might need a very large architecture to be represented by an insufficiently deep one. (Example: logic gates, multi-layer NN with linear threshold units and positive weight).

Yoshua Bengio, “Learning Deep Architectures for AI,” Foundations and Trends in Machine Learning, 2009.

Why Deep learning?

Page 74: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

74

Example: sum product network (SPN)

X2 X2

X3 X3

X1 X1 X4

X4

X5 X5

2N-1

N2N-1 parameters

O(N) parameters

Page 75: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

75

Depth of existing approaches

Boosting (2 layers) L 1: base learner L 2: vote or linear combination of layer 1

Decision tree, LLE, KNN, Kernel SVM (2 layers) L 1: matching degree to a set of local templates. L 2: Combine these degrees

Brain: 5-10 layers i iiKb ),( xx

Page 76: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

76

Why decision tree has depth 2?

Rely on partition of input space.Local estimator. Rely on partition of input space

and use separate params for each region. Each region is associated with a leaf.

Need as many as training samples as there are variations of interest in the target function. Not good for highly varying functions.

Num. training sample is exponential to Num. dim in order to achieve a fixed error rate.

Page 77: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

77

Deep Belief Net

Inference problem: Infer the states of the unobserved variables.

Learning problem: Adjust the interactions between variables to make the network more likely to generate the observed data

h1

x

h2

h3

… …

… …

… …

… …

P(x,h1,h2,h3) = p(x|h1) p(h1|h2) p(h2,h3)

Page 78: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

78

Deep Belief Net

Inference problem (the problem of explaining away):

B A

C

h11 h12

x1

h1

x

… …

… …

P(A,B|C) = P(A|C)P(B|C)

P(h11, h12 | x1) ≠ P(h11| x1) P(h12 | x1)

An example from manuscript Sol: Complementary prior

Page 79: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

79

Deep Belief Net

h1

x

h2

h4

… …

… …

… …

… …

h3 … …

2000

1000

500

30

Sol: Complementary prior

Inference problem (the problem of explaining away)

Sol: Complementary prior

Page 80: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

80

Deep Belief Net

Explaining away problem of Inference (see the manuscript) Sol: Complementary prior, see the manuscript

Learning problem Greedy layer by layer RBM training (optimize lower boun

d) and fine tuning Contrastive divergence for RBM training

h1

x

h2

h3

… …

… …

… …

… …

P(hi = 1|x) = σ(ci +Wi · x)

… …

… …

… …

… …

… …

… …

h1

x

h2

h1

h3

h2

Page 81: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

81

Deep Belief Net

Why greedy layerwise learning work? Optimizing a lower bound:

When we fix parameters for layer 1 and optimize the parameters for layer 2, we are optimizing the P(h1) in (1)

1h

11111

1

x|hx|hx|hhx|h

hx,x

)]}(log)()](log)()[log({

)(log)(log

QQPPQ

PPh

… …

… …

… …

… …

… …

… …

h1

x

h2

h1

h3

h2

(1)

Page 82: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

82

Deep Belief Net and RBM

RBM can be considered as DBN that has infinitive layers

TW

… …

… …

… …

… …

… …

h0

x0

h1

x1

x2

W

W

TW

… …

… …h0

x0

W

Page 83: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

83

Pretrain, fine-tune and inference – (autoencoder)

(BP)

Page 84: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

84

Pretrain, fine-tune and inference - 2

y: identity or rotation degree

Pretraining Fine-tuning

Page 85: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

85

How many layers should we use?

There might be no universally right depth

Bengio suggests that several layers is better than one Results are robust against changes in the size of a laye

r, but top layer should be big A parameter. Depends on your task. With enough narrow layers, we can model any distribu

tion over binary vectors [1]

Copied from http://videolectures.net/mlss09uk_hinton_dbn/

[1] Sutskever, I. and Hinton, G. E., Deep Narrow Sigmoid Belief Networks are Universal Approximators. Neural Computation, 2007

Page 86: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

86

Effect of Unsupervised Pre-training

Erhan et. al. AISTATS’2009

Page 87: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

87

Effect of Depth

w/o pre-trainingwith pre-trainingwithout pre-training

Page 88: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

88

Why unsupervised pre-training makes sense

stuff

image label

stuff

image label

If image-label pairs were

generated this way, it

would make sense to try

to go straight from

images to labels.

For example, do the

pixels have even parity?

If image-label pairs are

generated this way, it

makes sense to first learn

to recover the stuff that

caused the image by

inverting the high

bandwidth pathway.

high

bandwidthlow

bandwidth

Page 89: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

89

Beyond layer-wise pretraining

Layer-wise pretraining is efficient but not optimal. It is possible to train parameters for all layers using a wake

-sleep algorithm.

Bottom-up in a layer-wise manner Top-down and reffiting the earlier models

Page 90: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

90

Fine-tuning with a contrastive version of the “wake-sleep” algorithm

After learning many layers of features, we can fine-tune the features to improve generation.

1. Do a stochastic bottom-up pass Adjust the top-down weights to be good at reconstructing the fe

ature activities in the layer below.

2. Do a few iterations of sampling in the top level RBM-- Adjust the weights in the top-level RBM.

3. Do a stochastic top-down pass Adjust the bottom-up weights to be good at reconstructing the f

eature activities in the layer above.

Page 91: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

91

Include lateral connections

RBM has no connection among layers This can be generalized. Lateral connections for the first layer [1].

Sampling from P(h|x) is still easy. But sampling from p(x|h) is more difficult.

Lateral connections at multiple layers [2].

Generate more realistic images. CD is still applicable, with small modification.

[1]B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: a strategy employed by V1?,” Vision Research, vol. 37, pp. 3311–3325, December 1997.[2]S. Osindero and G. E. Hinton, “Modeling image patches with a directed hierarchy of Markov random field,” in NIPS, 2007.

Page 92: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

92

Without lateral connection

Page 93: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

93

With lateral connection

Page 94: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

94

My data is real valued …

Make it [0 1] linearly: x = ax + b

Use another distribution

Page 95: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

95

My data has temporal dependency …

Static:

Temporal

Page 96: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

96

Consider DBN as…

A statistical model that is used for unsupervised training of fully connected deep model

A directed graphical model that is approximated by fast learning and inference algorithms

A directed graphical model that is fine tuned using mature neural network learning approach -- BP.

Page 97: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

97

Outline

Basic background on statistical learning and Graphical model

Contrastive divergence and Restricted Boltzmann machine

Deep belief net (DBN)Why DBN? Learning and inference

Applications

Page 98: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

98

Applications of deep learning

Hand written digits recognition Dimensionality reduction Information retrieval Segmentation Denoising Phone recognition Object recognition Object detection …

Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks, Science 2006.Welling, M. etc., Exponential Family Harmoniums with an Application to Information Retrieval, NIPS 2004A. R. Mohamed, etc., Deep Belief Networks for phone recognition, NIPS 09 workshop on deep learning for speech recognition.Nair, V. and Hinton, G. E. 3-D Object recognition with deep belief nets. NIPS09

………………………….

Page 99: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

99

Object recognition

NORB logistic regression 19.6%, kNN (k=1) 18.4%, Gaussian kern

el SVM 11.6%, convolutional neural net 6.0%, convolutional net + SVM hybrid 5.9%. DBN 6.5%.

With the extra unlabeled data (and the same amount of labeled data as before), DBN achieves 5.2%.

Page 100: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

100

Learning to extract the orientation of a face p

atch (Salakhutdinov & Hinton, NIPS 2007)

Page 101: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

101

The training and test sets

11,000 unlabeled cases100, 500, or 1000 labeled cases

face patches from new people

Page 102: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

102

The root mean squared error in the orientation when combining GP’s with deep belief nets

22.2 17.9 15.2

17.2 12.7 7.2

16.3 11.2 6.4

GP on

the

pixels

GP on

top-level

features

GP on top-level

features with

fine-tuning

100 labels

500 labels

1000 labels

Conclusion: The deep features are much better

than the pixels. Fine-tuning helps a lot.

Page 103: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

103

Deep Autoencoders(Hinton & Salakhutdinov, 200

6) They always looked like a really ni

ce way to do non-linear dimensionality reduction: But it is very difficult to opti

mize deep autoencoders using backpropagation.

We now have a much better way to optimize them: First train a stack of 4 RBM’s Then “unroll” them. Then fine-tune with backpro

p.

1000 neurons

500 neurons

500 neurons

250 neurons

250 neurons

30

1000 neurons

28x28

28x28

1

2

3

4

4

3

2

1

W

W

W

W

W

W

W

W

T

T

T

T

linear

units

Page 104: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

104

Deep Autoencoders (Hinton & Salakhutdinov, 2006)

real

data

30-D

deep auto

30-D PCA

Page 105: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

105

A comparison of methods for compressing digit images to 30 real numbers.

real

data

30-D

deep auto

30-D logistic

PCA

30-D

PCA

Page 106: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

106

Representation of DBN

Page 107: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

107

Summary

Deep belief net (DBN) is a network with deep layers, which provides strong representa

tion power; is a generative model; can be learned by layerwise RBM using Contrastive Divergence; has many applications and more applications is yet to be found.

Generative models explicitly or implicitly model the distribution of inputs and outputs.

Discriminative models model the posterior probabilities directly.

Page 108: Deep Learning - SNU is just another Neural Network with sparse connections Learning Algorithm: Back Propagation on Convolution Layers and Fully-Connected Layers

108

DBN VS SVM

A very controversial topic Model

DBN is generative, SVM is discriminative. But fine-tuning of DBN is discriminative

Application SVM is widely applied. Researchers are expanding the application area of DBN.

Learning DBN is non-convex and slow SVM is convex and fast (in linear case).

Which one is better? Time will say. You can contribute

Hinton: The superior classification performance of discriminative learning methods holds only for domains in which it is not possible to learn a good generative model. This set of domains is being eroded by Moore’s law.