21

Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

Learning the human worldwith Deep Belief Networks

J�erome Louradourwith Yoshua Bengio, Olivier Breuleux, Daniel Cernea, Aaron Courville,Olivier Delalleau, Dumitru Erhan, Pascal Lamblin, Marina Sokolava, . . .

Learning the world with deep belief networks J�erome Louradour 1/14

Page 2: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

MotivationsThe `baby AI' projectGeneral presentation

Towards the goal of arti�cial intelligence. . .Make the machine learn with minimal\engineering intervention"(hardcoded rules, task-speci�c heuristics, . . . )How can we hope to perform well?Feed well-established algorithms with cheap data (TV, video)\cheap"= unlabeled, simulated, . . .Why to focus on Deep Belief Networks?1 Exploit huge amounts of unlabeled data. . .. . . to generalize well with few labeled data (speci�c tasks)2 Gradual learning: �rst simple concepts, then + and + abstract3 Multi-modality (image, text, audio)

Learning the world with deep belief networks J�erome Louradour 2/14

Page 3: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

MotivationsThe `baby AI' projectScienti�c goals

Semi-supervised learningMaster the unsupervised learning.Gradual learningAs for children, the learning processmust be more e�cient with a goodcurriculum.(show simple examples �rst,then more complicated ones)Multi-modality\Multi-path"DBN + encourage RBMsto be mutually predictive.Dynamic aspectTemporal RBM (James Bergstra).etc.

Learning the world with deep belief networks J�erome Louradour 3/14

Page 4: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

MotivationsThe `baby AI' projectScienti�c goals

Semi-supervised learningMaster the unsupervised learning.Gradual learningAs for children, the learning processmust be more e�cient with a goodcurriculum.(show simple examples �rst,then more complicated ones)Multi-modality\Multi-path"DBN + encourage RBMsto be mutually predictive.Dynamic aspectTemporal RBM (James Bergstra).etc.

Learning the world with deep belief networks J�erome Louradour 3/14

Page 5: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

MotivationsThe `baby AI' projectScienti�c goals

Semi-supervised learningMaster the unsupervised learning.Gradual learningAs for children, the learning processmust be more e�cient with a goodcurriculum.(show simple examples �rst,then more complicated ones)Multi-modality\Multi-path"DBN + encourage RBMsto be mutually predictive.Dynamic aspectTemporal RBM (James Bergstra).etc.

Learning the world with deep belief networks J�erome Louradour 3/14

Page 6: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

MotivationsThe `baby AI' projectA �rst step

Topic Question given to the computer AnswerColor There is a small triangle. What color is it? GreenShape What is the shape of the green object? TriangleLocation Is the blue square at the top or at the bottom? At the topSize There is a triangle on the right.Is it rather small or bigger? SmallSize (relative) Is the square smaller or bigger than the triangle? Bigger

Learning the world with deep belief networks J�erome Louradour 4/14

Page 7: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

MotivationsThe `baby AI' projectA �rst step: preliminary results

Chance Baseline Error Rate (%)Topic Error Rate (%) 1 object 2 objects 3 objects(relative attributes)color 25 10 40 55shape 66 50 55 60size 50 0.5 25 30location 50 5 35 40Results easily degrades when adding sources of variability.Image part: Shapes are very hard to capture (translation + rotation)Textual part: No problem to understand the topic.But when several objects, hard to guess the object of interest.

Learning the world with deep belief networks J�erome Louradour 5/14

Page 8: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

MotivationsThe `baby AI' projectA �rst step: preliminary results

Chance Baseline Error Rate (%)Topic Error Rate (%) 1 object 2 objects 3 objects(relative attributes)color 25 10 40 55shape 66 50 55 60size 50 0.5 25 30location 50 5 35 40Results easily degrades when adding sources of variability.Image part: Shapes are very hard to capture (translation + rotation)Textual part: No problem to understand the topic.But when several objects, hard to guess the object of interest.First attempts with DBN: not much better than shallow architecture. . .So we came back to basics

Learning the world with deep belief networks J�erome Louradour 5/14

Page 9: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataGenerative learning of RBM

Real criterion = emperical likelihood (on unlabeled data)Practical limitation: complexity O(2min(Nh;Nv )),! only for small models (little capacity)

Learning the world with deep belief networks J�erome Louradour 6/14

Page 10: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataLearning by contrastive divergence

De�nition: Free Energy

FE(x) = � logPh e�E(x;h)= � log p�(x) + logZ (�)9; p�(x) = e�FE(x)Pv e�FE(v)

Learning the world with deep belief networks J�erome Louradour 7/14

Page 11: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataLearning by contrastive divergence

Best gradient descent to maximize data likelihood:�r� log p�(xi ) =r�FE(xi )�X

v

p�(v)r�FE(v) (1)

Learning the world with deep belief networks J�erome Louradour 8/14

Page 12: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataLearning by contrastive divergence

Best gradient descent to maximize data likelihood:�r� log p�(xi ) =r�FE(xi )�X

v

p�(v)r�FE(v) (1)Approximation: for each xi , sample an x (n Gibbs steps)

p�(x) � P(xjxi ) (H1)Given the analytic expression of r�FE , we update according to

(1) (H1)====r�FE(xi )�r�FE(x)Learning the world with deep belief networks J�erome Louradour 8/14

Page 13: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataLearning by contrastive divergence

Given the analytic expression of r�FE , we update parameters withr�FE(xi ) � r�FE(x)

Tends to � FE & on real dataFE % on data sampled by the RBM

Learning the world with deep belief networks J�erome Louradour 9/14

Page 14: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataLearning by contrastive divergence: The trap

Given the analytic expression of r�FE , we update parameters withr� (FE(xi )) � (((

(((r� (FE(x))r� FEjxi � r�FEjx(�)

. . . estimate of: E�r�FE(x)�X

x

p(xj�)r�FE(x)� (2)

Learning the world with deep belief networks J�erome Louradour 10/14

Page 15: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataLearning by contrastive divergence: The trap

Given the analytic expression of r�FE , we update parameters withr� (FE(xi )) � (((

(((r� (FE(x))r� FEjxi � r�FEjx(�)

. . . estimate of: E�r�FE(x)�X

x

p(xj�)r�FE(x)� (2)1 FE(x)�FE(x) is not the optimized function

2 Nothing guarantees (2) is the gradient of a scalar function...

Learning the world with deep belief networks J�erome Louradour 10/14

Page 16: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataLearning by contrastive divergence: The trap

Given the analytic expression of r�FE , we update parameters withr� (FE(xi )) � (((

(((r� (FE(x))r� FEjxi � r�FEjx(�)

. . . estimate of: E�r�FE(x)�X

x

p(xj�)r�FE(x)� (2)1 FE(x)�FE(x) is not the optimized function

2 Nothing guarantees (2) is the gradient of a scalar function...So how to choose the best hyper-parameters?

Learning the world with deep belief networks J�erome Louradour 10/14

Page 17: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataHow to choose the best hyper-parameters? (1/2)

Visualizing generated samples

Give an insight of the learned representationGive an idea the weakness of the models.Topic Chance Error Rate (%) Error Rate (%)color 25 7shape 66 47size 50 0location 50 4

Learning the world with deep belief networks J�erome Louradour 11/14

Page 18: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataHow to choose the best hyper-parameters? (2/2)

Monitoring the reconstruction errorRBM as autoassociator (Mean-Field approximation)Rq: we can also train the reconstruction error by its stochasticgradient descent

Learning the world with deep belief networks J�erome Louradour 12/14

Page 19: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataHelping the RBM to work better

As for neural networks, redundancy in trained models.Fully connected RBMs Convolution RBMs

. . .. . . . . . . . . . . . . . .

. . .GoalWhile teaching RBM's units to do something good,make them do di�erent things

Learning the world with deep belief networks J�erome Louradour 13/14

Page 20: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataHelping the RBM to work better

As for neural networks, redundancy in trained models.Fully connected RBMs Convolution RBMs

. . .. . . . . . . . . . . . . . .

. . .Di�erent heuristicsExample for binomial units, with qk = p(hk = 1jx)

C(�) = � logE [p�(x)] + �C �fqkqlgk 6=l�

Learning the world with deep belief networks J�erome Louradour 13/14

Page 21: Learning the human world with Deep Belief Networksamnih/cifar/talks/louradour_talk.pdf · Why to focus on Deep Belief Networks? 1 Exploit huge amounts of unlabeled data.....to generalize

On strategies to learn RBMs from unlabeled dataFuture work

Lot's of things to trybefore making DBN learn the human world with TV. . .

Learning the world with deep belief networks J�erome Louradour 14/14