34
Advanced Machine Learning / Deep Belief Networks Daniel Ulbricht

Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Advanced Machine Learning / Deep Belief NetworksDaniel Ulbricht

Page 2: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Agenda

● Short History in Machine Learning● What's Deep Learning?● Derive Learning in General● Energy Based Models / Restricted

Boltzmann Machines● Bring things together

● You will Implement a core algorithm to apply learning

Page 3: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

History: 1th wave

"The Perceptron"Frank Rosenblatt explained the Perceptron in the year 1958

● Simple problems like XOR could be solved

Page 4: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

History: 2nd wave

"Backpropagation"

Developed by Paul Werbos in 1974● Complex non-linear problems could be

solved

Page 5: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

History: 3rd wave

"Deep Belief Networks"

The "magic" behind will be explained in this talk

Developed mainly 2006 by Geoff Hinton

Page 6: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

History: Deep Learning

● An automatic way to learn representations (descriptors) from given data

● Attempt to learn multiple levels of representations on increasing complexity

Lent from Andrew Ng

Page 7: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

History: Deep Learning

● BackPropagation is already an attempt to perform Deep Learning

● But there are some problems○ Gradient is progressively getting diluted○ Initialization of weights○ How to label all the given data

Decreasing update strength

Page 8: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Machine Learning in General

Input

Output

Goal:Find weights W to maximize the probability of a certain outputs given some input vectors

Maximize:

Weights

Page 9: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Machine Learning in General

Learning can be performed using:● Gradient Ascent on: log P● Gradient Descent on: - log P

From the optimization theory we know many downhill optimization algorithms● (Stochastic) Gradient Descent● Conjugant Gradient● Dog leg

Page 10: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Max the Log likelihood of System

Log likelihood of Data

Average log likelihood per pattern log likelihood of Normalization Term

Page 11: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Rules for Gradient Computation

Sum Rule:

Product Rule:

1:

2:

Page 12: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Gradient of first part

Gradient of first part:

Sum over posterior ;-)

Rule 1

Sum Rule

Reorder

Product Rule

Rule 2

Page 13: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Gradient of second part

Gradient of second part:

Sum over joint

Page 14: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Full Gradient

Full Gradient:

Two averages around the same term

Therefore we can write:

Hebbian /Positive phase

Anti - Hebbian /Negative phase

Page 15: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Gradient in Sigmoid Belief Nets

Apply this knowledge on normal Sigmoid Nets● Used in backpropagation

Joint is automatically normalized:

This leads the second gradient term to be zero:

Full gradient:

The well known: Delta Rule

Page 16: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Energy Based Models (EBM)

Energy Based Probabilistic Models define a probability distribution as follows:

High Probability -> Low EnergyLow Probability -> High Energy-> Minimize the energy

Partition function (Normalization term)

Page 17: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Energy Based Models with Hidden Units

In reality with can't observe the full state of our data and/or we are not aware of indirect influences -> Therefore we add Hidden Units to to increase the expression power of the model.

Page 18: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Restricted Boltzmann Machine

Fancy name for a simple bidirectional graph:● No connection inside the same layer● No loops● Energy function is used to perform

transitions

Page 19: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Alternate Gibbs Sampling

Computing the average of the posterior and the joint is very expensive

To overcome this Gibbs Sampling is used

Gibbs Sampling inside Energy Based Models leads to the simple sigmoid function

Poof is easy to do but would take too longUse the normal Gibbs algorithm and put in the Energy term for the distribution. You can find it on my webpage

Page 20: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Alternate Gibbs Sampling

Alternate Gibbs Sampling:

● Sample Up (visible to hidden)● Sample Down (Hidden to visible)● continue...

Page 21: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Alternate Gibbs Sampling

● Running it infinite iterations would give exact gradient (~ Monte Carlo Markov Chain)

● Surprisingly even a single iteration works very well in practice○ Geoff Hinton tried this in 2006 and recognized that

the system converges well even with a single iteration

○ Called Contrastive Divergence

Page 22: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Alternate Gibbs Sampling

Alternate Gibbs Sampling:

● Start with training vector● Sample Up (visible to hidden)● Sample Down (Hidden to visible)● Sample Up

Hebbian Anti- Hebbian

Page 23: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Bring things together

For simplification we use from now on binary input and output units● Terms get much easier to compute● Its also the common way using in practical

applications

Page 24: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Bring things together

● Hebbian Part (Up Step):Sum over all visible units multiplied with the according hidden unit weight

Sigmoid function - We can use it due to the fact that gibbs sampling inside an EBM is sigmoid

Make the output Stochastic.Simplifies the next steps

bias for hidden unit

Page 25: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Bring things together

● Hebbian Part (Down Step):

Same as Up Step only using a different bias

bias for visible unit

Page 26: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Bring things together

● Anti - Hebbian Part (Up Step):

Instead of using the reconstructed output now the probability is used.

Page 27: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Bring things together

● Full Gradient:

Don't forget the bias:

Average over posterior Average over joint

Page 28: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

So Far We Have

● The knowledge to train a Restricted Boltzmann Machine

● No need for labels -> Our labels are the equilibrium level of the Energy Function

Open Question:● How to perform Deep Learning without

the factorial behaviour

Page 29: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Stacking RBM's

To perform Deep Learning we stack multiple RBM's but learn them layer per layer

Input

Page 30: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Stacking RBM's

To perform Deep Learning we stack multiple RBM's but learn them layer per layer

Input

Hidden

W1

Page 31: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Stacking RBM's

To perform Deep Learning we stack multiple RBM's but learn them layer per layer

Input

Hidden

W1 <- Fixed (We don't update anymore)

Hidden

W2

Page 32: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Now we have

● A network which learns○ Without labels

■ Labels are the equilibrium level of energy term○ Every layer learns a significant amount

■ Due to be independent from every other layer

Page 33: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Get hands on:

Download the example matlab/octave files from my homepage

You will recognize that calling runRBM● will do nothing so far● It misses the implementation of

"Contrastive Divergence"● Try to implement the "Contrastive

Divergence"○ Solution can be found also on my homepage

Page 34: Deep Belief Networks Learning Advanced Machines319655500.online.de/du/upload/DeepBeliefLearning.pdf · "Deep Belief Networks" The "magic" behind will be explained in this talk Developed

Thank you for Listening