47
Boltzmann Machines and their Extensions S. M. Ali Eslami Nicolas Heess John Winn March 2013 Heriott-Watt University

Boltzmann Machines and their Extensions

  • Upload
    rane

  • View
    26

  • Download
    1

Embed Size (px)

DESCRIPTION

Boltzmann Machines and their Extensions. S. M. Ali Eslami Nicolas Heess John Winn. March 2013 Heriott -Watt University. Goal. Define a probabilistic distribution on images like this:. What can one do with an ideal shape model?. Segmentation. Weizmann horse dataset. - PowerPoint PPT Presentation

Citation preview

PowerPoint Presentation

Boltzmann Machines and their ExtensionsS. M. Ali EslamiNicolas HeessJohn WinnMarch 2013Heriott-Watt University

GoalDefine a probabilistic distribution on images like this:

2

What can one do with an ideal shape model?3Segmentation

Weizmann horse dataset4Sample training images327 images

What can one do with an ideal shape model?5Image

What can one do with an ideal shape model?6Computer graphics

Energy based models7Gibbs distribution

Shallow architectures8

Mean

Shallow architectures9

MRF

Existing shape models10Most commonly used architecturesMRFMean

sample from the modelsample from the modelWhat is a strong model of shape?We define a strong model of object shape as one which meets two requirements:11RealismGenerates samples that look realisticGeneralizationCan generate samples that differ from training imagesTraining images

Real distributionLearned distribution

Shallow architectures12

HOP-MRFShallow architectures13RBM

Shallow architectures14

The effect of the latent variables can be appreciated by considering the marginal distribution over the visible units:Restricted Boltzmann MachinesIn fact, the hidden units can be summed out analytically. The energy of this marginal distribution is given by:Shallow architectures15Restricted Boltzmann Machines

where

All hidden units are conditionally independent given the visible units and vice versa.Shallow architectures16Restricted Boltzmann Machines

RBM inference17Block-Gibbs MCMC

RBM inference18Block-Gibbs MCMC

RBM learningMaximize with respect to 19Stochastic gradient descent

RBM learningGetting an unbiased sample of the second term, however is very difficult. It can be done by starting at any random state of the visible units and performing Gibbs sampling for a very long time. Instead:20Contrastive divergence

RBM inference21Block-Gibbs MCMC

RBM inference22Block-Gibbs MCMCRBM learningCrudely approximating the gradient of the log probability of the training data. More closely approximating the gradient of another objective function called the Contrastive Divergence, but it ignores one tricky term in this objective function so it is not even following that gradient. Sutskever and Tieleman have shown that it is not following the gradient of any function.Nevertheless, it works well enough to achieve success in many significant applications.23Contrastive divergenceDeep architectures24DBM

Deep architectures25Deep Boltzmann Machines

Conditional distributions remain factorised due to layering.Deep architectures26Deep Boltzmann Machines

Shallow and Deep architectures27Modeling high-order and long-range interactions

MRF

RBM

DBM

Deep Boltzmann MachinesProbabilisticGenerativePowerful

Typically trained with many examples.We only have datasets with few training examples.28

DBM

From the DBM to the ShapeBM29Restricted connectivity and sharing of weights

DBMShapeBMLimited training data, therefore reduce the number of parameters:

Restrict connectivity,Tie parameters,Restrict capacity.Shape Boltzmann Machine30Architecture in 2D

Top hidden units capture object poseGiven the top units, middle hidden units capture local (part) variabilityOverlap helps prevent discontinuities at patch boundariesShapeBM inference31Block-Gibbs MCMC

imagereconstructionsample 1sample nFast: ~500 samples per secondShapeBM learningMaximize with respect to

Pre-trainingGreedy, layer-by-layer, bottom-up,Persistent CD MCMC approximation to the gradients.

Joint trainingVariational + persistent chain approximations to the gradients,Separates learning of local and global shape properties.32Stochastic gradient descent

~2-6 hours on the small datasets that we considerResultsWeizmann horses 327 images 2000+100 hidden unitsSampled shapes 34Evaluating the Realism criterionWeizmann horses 327 images

Data

FAIncorrect generalization

RBMFailure to learn variability

ShapeBMNatural shapesVariety of posesSharply defined detailsCorrect number of legs (!)Weizmann horses 327 images 2000+100 hidden unitsSampled shapes 35Evaluating the Realism criterionWeizmann horses 327 images

This is great, but has it just overfit?Sampled shapes 36Evaluating the Generalization criterionWeizmann horses 327 images 2000+100 hidden units

Sample from the ShapeBMClosest image in training datasetDifference between the two images

Interactive GUI37Evaluating Realism and GeneralizationWeizmann horses 327 images 2000+100 hidden units

Further results38Sampling and completionCaltech motorbikes 798 images 1200+50 hidden units

TrainingimagesShapeBM samplesSamplegeneralizationShapecompletionConstrained shape completion39Evaluating Realism and GeneralizationWeizmann horses 327 images 2000+100 hidden units

ShapeBMNNFurther results40Constrained completionCaltech motorbikes 798 images 1200+50 hidden units

ShapeBMNNImputation scoresCollect 25 unseen horse silhouettes,

Divide each into 9 segments,

Estimate the conditional log probability of a segment under the model given the rest of the image,

Average over images and segments.41Quantitative comparisonWeizmann horses 327 images 2000+100 hidden units

MeanRBMFAShapeBMScore-50.72-47.00-40.82-28.85Multiple object categoriesTrain jointly on 4 categories without knowledge of class:42Simultaneous detection and completionCaltech-101 objects 531 images 2000+400 hidden units

Shape completion

SampledshapesWhat does h2 do?Weizmann horsesPose information43

Multiple categoriesClass label information

Number of training imagesAccuracy

What does h2 do?44

What does the overlap do?45

SummaryShape models are essential in applications such as segmentation, detection, in-painting and graphics.

The ShapeBM characterizes a strong model of shape:Samples are realistic,Samples generalize from training data.

The ShapeBM learns distributions that are qualitatively and quantitatively better than other models for this task.46QuestionsMATLAB GUI available athttp://arkitus.com/Ali/