31
Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas Department of Computer Science, University of British Columbia 1 Inductive Principles for Restricted Boltzmann Machine Learning Joint work with Kevin Swersky, Bo Chen and Nando de Freitas Benjamin Marlin Department of Computer Science University of British Columbia

Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

  • Upload
    dinhdan

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 1

Inductive Principles for

Restricted Boltzmann

Machine Learning

Joint work with Kevin Swersky, Bo Chen and Nando de Freitas

Benjamin MarlinDepartment of Computer Science

University of British Columbia

Page 2: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 2

Consistent

Introduction: Maximum Likelihood• Maximum Likelihood Estimation is statistically consistent and efficient but is not computationally tractable for many models of interest like RBM’s, MRF’s, CRF’s due to the partition function.

Efficient Tractable

Page 3: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 3

Introduction: Alternative Estimators• Recent work has seen the proposal of many new estimators that trade consistency/efficiency for computational tractabilityincluding RM, SM, GSM, MPF, NCE, NLCE.

Consistent

Efficient Tractable

Page 4: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 4

Introduction: Alternative Estimators• Our main interest is uncovering the relationships between these estimators and studying their theoretical and empirical properties.

Consistent

Efficient Tractable

Page 5: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 5

Outline:

• Boltzmann Machines and RBMs • Inductive Principles

• Maximum Likelihood• Contrastive Divergence• Pseudo-Likelihood• Ratio Matching• Generalized Score Matching• Minimum Probability Flow

• Experiments• Demo

Page 6: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 6

Introduction: Restricted Boltzmann Machines

H1 H2 H3 H4

X1 X2 X3

K Hidden Units

D Visible Units

• A Restricted Boltzmann Machine (RBM) is a Boltzmann Machine with a bipartite graph structure.

• Typically one layer of nodes are fully observed variables (the visible layer), while the other consists of latent variables (the hidden layer).

Page 7: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 7

Introduction: Restricted Boltzmann Machines

• The joint probability of the visible and hidden variables is defined through a bilinear energy function.

Page 8: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 8

Introduction: Restricted Boltzmann Machines

• The bipartite graph structure gives the RBM a special property: the visible variables are conditionally independent given the hidden variables and vice versa.

Page 9: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 9

Introduction: Restricted Boltzmann Machines

• The marginal probability of the visible vector is obtained by summing out over all joint states of the hidden variables.

Page 10: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 10

Introduction: Restricted Boltzmann Machines

• This construction eliminates the latent, hidden variables, leaving a distribution defined in terms of the visible variables.

• However, computing the normalizing constant (partition function) still has exponential complexity in D.

Page 11: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 11

Outline:

• Boltzmann Machines and RBMs • Inductive Principles

• Maximum Likelihood• Contrastive Divergence• Pseudo-Likelihood• Ratio Matching• Generalized Score Matching

• Experiments• Demo

Page 12: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 12

Stochastic Maximum Likelihood

• Exact maximum likelihood learning is intractable in an RBM. Stochastic ML estimation can instead be applied, usually using a simple block Gibbs sampler.

Page 13: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 13

Stochastic Maximum Likelihood

• Exact maximum likelihood learning is intractable in an RBM. Stochastic ML estimation can instead be applied, usually using a simple block Gibbs sampler.

Update Weights Update Weights

•L. Younes. Parametric inference for imperfectly observed Gibbsian fields. Prob. Th. and Related Fields, 82(4):625–645, 1989.•T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. ICML 25, 2008.

Page 14: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 14

Contrastive Divergence

• The contrastive divergence principle results in a gradient thatlooks identical to stochastic maximum likelihood. The differenceis that CD samples from the T-step Gibbs distribution.

Page 15: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 15

Contrastive Divergence

• The contrastive divergence principle results in a gradient thatlooks identical to stochastic maximum likelihood. The differenceis that CD samples from the T-step Gibbs distribution.

Update Weights & Reset Chain to Data

Page 16: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 16

Pseudo-Likelihood

• The principle of maximum pseudo-likelihood is based on optimizing a product of one-dimensional conditional densities under a log loss.

Page 17: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 17

Pseudo-Likelihood

• The principle of maximum pseudo-likelihood is based on optimizing a product of one-dimensional conditional densities under a log loss.

Page 18: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 18

Ratio Matching

• The ratio matching principle is very similar to pseudo-likelihood, but is based on minimizing a squared difference between one dimensional conditional distributions.

Aapo Hyvarinen. Some extensions of score matching. Computational Statistics & Data Analysis, 51(5):2499–2512, 2007.

Page 19: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 19

Generalized Score Matching

• The generalized score matching principle is similar to ratio matching, except that the difference between inverse one dimensional conditional distributions is minimized.

Siwei Lyu. Interpretation and generalization of score matching. In Uncertainty in Artificial Intelligence 25, 2009.

Page 20: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 20

Minimum Probability Flow

• Minimize the flow of probability from data states to non-data states (as we’ve just seen!).

Jascha Sohl-Dickstein, Peter Battaglino, Michael R. DeWeese. Minimum Probability Flow Learning.

Page 21: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 21

Comparison: Gradients

Page 22: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 22

Comparison: Weighting Functions

Page 23: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 23

Comparison: Weighting FunctionsWhat about MPF?

Page 24: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 24

Comparison: A Manifold of Estimators?Dimensions:

1. Neighborhood structure around data configurations.

2. Form of loss function on the probability ratio.• Smooth• Monotonically decreasing • Bounded below• Others?

Covers: PL, GLP, NLCE, RM, GSM, MPF

Limitations: No good for missing data/explicit latent variables.

Page 25: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 25

Outline:

• Boltzmann Machines and RBMs • Inductive Principles

• Maximum Likelihood• Contrastive Divergence• Pseudo-Likelihood• Ratio Matching• Generalized Score Matching

• Experiments• Demo

Page 26: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 26

Experiments:

Data Sets:• MNIST handwritten digits• 20 News Groups• CalTech 101 Silhouettes

Evaluation Criteria:• Log likelihood (using AIS estimator)• Classification error• Reconstruction error• De-noising• Novelty detection

Page 27: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 27

Experiments: Log Likelihood

Page 28: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 28

Experiments: Classification Error

Page 29: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 29

Experiments: De-noising

Page 30: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 30

Experiments: Novelty Detection

Page 31: Inductive Principles for Restricted Boltzmann Machine Learningasamir/cifar/cifarss-marlin.pdf · 2010-08-14 · Inductive Principles for Restricted Boltzmann Machine Learning Benjamin

Inductive Principles for Restricted Boltzmann Machine LearningBenjamin Marlin, Kevin Swersky, Bo Chen and Nando de Freitas

Department of Computer Science, University of British Columbia 31

Experiments: Learned Weights on MNIST