84
DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE 155 - MACHINE LEARNING & DATA MINING - LECTURE 17

Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

D E E P L E A R N I N GPA RT T H R E E - D E E P G E N E R AT I V E M O D E L S

C S / C N S / E E 1 5 5 - M A C H I N E L E A R N I N G & D ATA M I N I N G - L E C T U R E 1 7

Page 2: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

G E N E R AT I V E M O D E L S

Page 3: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

3

DATA

Page 4: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

4

DATA

Page 5: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

5

example 1

DATA

Page 6: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

6

example 2

DATA

Page 7: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

7

example 3

DATA

Page 8: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

8

num

ber

of d

ata

exam

ple

s

DATA

Page 9: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

9

num

ber

of d

ata

exam

ple

sfeature 1

DATA

Page 10: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

10

num

ber

of d

ata

exam

ple

sfeature 2

DATA

Page 11: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

11

num

ber

of d

ata

exam

ple

sfeature 3

DATA

Page 12: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

12

num

ber

of d

ata

exam

ple

snumber of features

DATA

Page 13: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

13

DATA DISTRIBUTION

feature 1

feature 2

feature 3

example 1

example 2

example 3

Page 14: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

14

DATA DISTRIBUTION

feature 1

feature 2

feature 3

Page 15: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

15

DENSITY ESTIMATION

feature 1

feature 2

feature 3

estimating the density of the empirically observed data distribution

Page 16: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

16

GENERATIVE MODEL

a model of the density of the data distribution

Page 17: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

17

by modeling the data distribution, generative models are able to generate new data examples

feature 1

feature 2

feature 3

generated examples

Page 18: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

18

generative modeldiscriminative model

Page 19: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

19

discriminative models vs. generative models

can both be trained using supervised learning

generative models typically require more modeling assumptions

generative models are often easier to train with unsupervised methods

straightforward to quantify uncertainty with generative models

Page 20: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

20

one of the main benefits of generative modeling is the ability to automatically extract structure from data

reducing the effective dimensionality of the data can make it easier to learn and generalize on new tasks

Page 21: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

21

one of the main benefits of generative modeling is the ability to automatically extract structure from data

reducing the effective dimensionality of the data can make it easier to learn and generalize on new tasks

labeled examples

Page 22: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

22

one of the main benefits of generative modeling is the ability to automatically extract structure from data

reducing the effective dimensionality of the data can make it easier to learn and generalize on new tasks

labeled examples

Page 23: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

23

any model that has an output in the data space can be considered a generative model

nervous systems appear to use this mechanism in part

prediction of sensory input using “top-down” pathways

Page 24: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

24

deep generative model

a generative model that uses deep neural networks to model the data distribution

Page 25: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

25

auto-regressive models

latent variable models

implicit models

FAMILIES OF (DEEP) GENERATIVE MODELS

Page 26: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

A U T O - R E G R E S S I V E M O D E L S

Page 27: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

27

a data example

number of features

x1 x2 x3 xM

p(x) = p(x1, x2, . . . , xM )

Page 28: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

28

x1 x2 x3 xM

p(x) = p(x1, x2, . . . , xM )

use chain rule of probability to split the joint distribution into a product of conditional distributions

note: conditioning order is arbitrary

definition of conditional probability p(a|b) = p(a, b)

p(b)p(a, b) = p(a|b)p(b)

p(x1, x2, . . . , xM ) = p(x1|x2, . . . , xM )p(x2, . . . , xM )

recursively apply to p(x1, x2, . . . , xM ) = p(x1|x2, . . . , xM )p(x2, . . . , xM )

p(x1, x2, . . . , xM ) = p(x1|x2, . . . , xM )p(x2|x3, . . . , xM ) . . . p(xM�1|xM )p(xM )

p(x1, . . . , xM ) =MY

j=1

p(xj |x1, . . . , xj�1)

Page 29: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

29

x1 x2 x3 xM

model the conditional distributions of the data

learn to auto-regress to the missing values

Page 30: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

30

x1 x2 x3 xM

model the conditional distributions of the data

learn to auto-regress to the missing values

p(x1)

Page 31: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

31

x1 x2 x3 xM

model the conditional distributions of the data

learn to auto-regress to the missing values

MODEL

p(x2|x1)

Page 32: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

32

x1 x2 x3 xM

model the conditional distributions of the data

learn to auto-regress to the missing values

MODEL

p(x3|x2, x1)

Page 33: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

33

x1 x2 x3 xM

model the conditional distributions of the data

learn to auto-regress to the missing values

MODEL

p(x4|x3, x2, x1)

Page 34: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

34

x1 x2 x3 xM

model the conditional distributions of the data

learn to auto-regress to the missing values

MODEL

p(x5|x4, x3, x2, x1)

Page 35: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

35

x1 x2 x3 xM

model the conditional distributions of the data

learn to auto-regress to the missing values

MODEL

p(x6|x5, x4, x3, x2, x1)

Page 36: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

36

x1 x2 x3 xM

model the conditional distributions of the data

learn to auto-regress to the missing values

MODEL

p(xM |xM�1, . . . , x1)

Page 37: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

37

maximum likelihood

to fit the model to the empirical data distribution, maximize the likelihood of the true data examples

likelihood: p(x) =MY

j=1

p(xj |x<j)

optimize the parameters to assign high (log) probability to the true data examples

learning: ✓⇤ = argmax✓ log p(x)

auto-regressive conditionals

logarithm for numerical stability

Page 38: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

38

models

can parameterize conditional distributions using a recurrent neural network

unrolling auto-regressive generation from an RNN “teacher forcing”

Deep Learning, Goodfellow et al., 2016

(chapter 10)

Page 39: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

39

models

can parameterize conditional distributions using a recurrent neural network

The Unreasonable Effectiveness of Recurrent Neural Networks, Karpathy, 2015

Pixel Recurrent Neural Networks, van den Oord et al., 2016

Page 40: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

can also condition on a local window using convolutional neural networks

40

models

Pixel Recurrent Neural Networks, van den Oord et al., 2016

Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016

WaveNet: A Generative Model for Raw Audio, van den Oord et al., 2016

Page 41: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

41

output distributions

need to choose a form for the conditional output distribution, i.e. how do we express ?p(xj |x1, . . . , xj�1)

model the data as categorical variables

model the data as continuous variables

Gaussian, logistic, etc. output

multinomial output

Page 42: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

42

example applications

text images

Pixel Recurrent Neural Networks, van den Oord et al., 2016

WaveNet: A Generative Model for Raw Audio, van den Oord et al., 2016

speech

Page 43: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

43

recap: auto-regressive models

difficult to capture “high-level” global structure

need to impose conditioning order

sequential sampling is computationally expensive

x1 x2 x3 xM

MODEL

p(x6|x5, x4, x3, x2, x1)

model conditional distributions to auto-regress to missing values

Pros

tractable and straightforward to evaluate the (log) likelihood

great at capturing details

superior quantitative performance

Cons

Page 44: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

E X P L I C I T L AT E N T VA R I A B L E M O D E L S

Page 45: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

45

reality generates sensory stimuli from underlying latent phenomena

REALITY

matterenergyforces

etc.

laws of nature inference

STIMULI PERCEPTION

object identitiesobject locationscommunication

etc.

can use latent variables to help model these phenomena

Page 46: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

46

probabilistic graphical models provide a framework for modeling relationships between random variables

observed variable

unobserved (latent) variable

x y

x y

directed

undirected

PLATE NOTATION

set of variables

x

N

Page 47: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

47

N

review exercise: represent an auto-regressive model of 3 random variables

with plate notation

x1 x2 x3

p✓(x1) p✓(x2|x1) p✓(x3|x1, x2)

Page 48: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

48

comparing auto-regressive models and latent variable models

N

x1 x2 x3

p✓(x1) p✓(x2|x1) p✓(x3|x1, x2)

auto-regressive model

N

x1 x2 x3

z

p✓(x1|z) p✓(x2|z) p✓(x3|z)

p✓(z)

latent variable model

Page 49: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

49

x

z

N

example: undirected latent variable model

cut f

or tim

e

restricted Boltzmann machine

Page 50: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

50

example: directed latent variable model

x

z

N

Page 51: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

51

example: directed latent variable model

p(x, z) = p(x|z)p(z)GENERATIVE MODEL

jointconditional likelihood

prior

x

z

N

Generation

1. sample from z p(z)

2. use samples to sample from z p(x|z)x

intuitive example

object ~ p(objects)

lighting ~ p(lighting)

background ~ p(bg)

RENDER

Page 52: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

52

example: directed latent variable model

posterior

joint

marginal likelihood

INFERENCE

p(z|x) = p(x, z)

p(x)

x

z

N

Posterior Inference

use Bayes’ rule (def. of cond. prob.)

provides conditional distribution over latent variables

intuitive example

observation

what is the probability that I am observing a cat given these pixel observations?

p(cat | )p( |cat) p(cat)______________=

p( )

Page 53: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

53

example: directed latent variable model

marginal likelihood

joint

MARGINALIZATION

p(x) =

Zp(x, z)dz

x

z

N

Model Evaluation

to evaluate the likelihood of an observation, we need to marginalize over all latent variables

i.e. consider all possible underlying states

intuitive example

observation

how likely is this observation under my model?(what is the probability of observing this?)

for all objects, lighting, backgrounds, etc.:

how plausible is this example?

Page 54: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

54

example: directed latent variable model

to fit the model, we want to evaluate the marginal (log) likelihood of the data

✓⇤ = argmax✓ log p(x)

however, this is generally intractable, due to the integration over latent variables

p(x) =

Zp(x, z)dz

integration in high-dimensions

x

z

N

Page 55: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

55

variational inference

introduce an approximate posterior, then minimize KL-divergence to the true posterior

q⇤(z|x) = argminqDKL(q(z|x)||p(z|x))

✓̃⇤ = argmax✓Lprovides a lower bound onL log p(x), so we can use to (approximately) fit the modelL

instead of optimizing the (log) likelihood, optimize a lower bound on it

main idea

p(z|x)

z

q⇤(z|x) = argmaxqL

where is the evidence lower bound (ELBO), defined asL L ⌘ Ez⇠q(z|x) [log p(x, z)� log q(z|x)]

evaluating KL-divergence involves evaluating , instead maximize :p(z|x) L

q(z|x)

Page 56: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

56

interpreting the ELBO

L ⌘ Ez⇠q(z|x) [log p(x, z)� log q(z|x)]

we can write the ELBO as

= Ez⇠q(z|x) [log p(x|z)p(z)� log q(z|x)]

= Ez⇠q(z|x) [log p(x|z) + log p(z)� log q(z|x)]{reconstruction

{regularization

is optimized to represent the data while staying close to the priorq(z|x)

resembles the “auto-encoding” framework

many connections to compression, information theory

= Ez⇠q(z|x) [log p(x|z)]�DKL(q(z|x)||p(z))

Page 57: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

57

variational inference involves optimizing the approximate posterior for each data example

q⇤(z|x) = argmaxqLcan be solved using gradient ascent and (stochastic) backpropagation / REINFORCE,

but can be computationally expensive

can instead amortize inference over data examples by learning a separate inference model to output approximate posterior estimates

“variational auto-encoder”

x inference model

z

q(z|x)

p(z)

generative model p(

x|z)

Learning to Infer, Marino et al., 2017

Page 58: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

58

hierarchical latent variable models

xN

z1

z2

Learning Hierarchical Features from Generative Models, Zhao et al., 2017

Improving Variational Inference with Inverse

Auto-regressive Flow, Kingma et al., 2016

iterative inference modelssequential latent variable models

A Recurrent Latent Variable Model for

Sequential Data, Chung et al., 2015

Deep Variational Bayes Filters:

Unsupervised Learning of State Space

Models from Raw Data, Karl et al., 2016Learning to Infer, Marino et al., 2017

Page 59: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

59

introducing latent variables to a generative model generally makes evaluating the (log) likelihood intractable

need to consider all possible “hypotheses” to evaluate (marginal) likelihood of the “outcome”

“hypotheses”

“outcome”

Page 60: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

60

change of variables

under certain conditions, we can use the change of variables formula to exactly evaluate the log likelihood

consider a variable in one dimension: x ⇠ Uniform(0, 1)

x

p(x)

1

1

then let be an affine transformation of , e.g. .

y xy = 2x+ 1

1

1

p(y)

y2 3

0.5

x

y

dx

dy

dy

dx> 0

x

y

dx

dy

dy

dx< 0

to conserve probability mass, p(y) = p(x)

����dx

dy

����

Normalizing Flows Tutorial, Eric Jang, 2018

Page 61: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

61

change of variables

in higher dimensions, conservation of probability mass generalizes to

p(y) = p(x)

����detdx

dy

���� = p(x)��detJ�1

��

where is the Jacobian matrix of the transformation, J J =dy

dx

CHANGE OF VARIABLES FORMULA

“law of the unconscious statistician” (LOTUS) can evaluate the probability from one variable’s distribution by evaluating the

probability of a transformed variable and the volume transformation

for certain classes of transformations, this is tractable to evaluate

expresses the local distortion in volume from the linear transformation��detJ�1

��

Page 62: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

62

change of variables

to use the change of variables formula, we need to evaluate��detJ�1

��

for an arbitrary Jacobian matrix, this is worst caseN ⇥N O(N3)

restricting the transformations to those with diagonal or triangular inverse Jacobians allows us to compute in .

��detJ�1��

O(N)

product of diagonal entries

Page 63: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

63

change of variables

Density Estimation Using Real NVP, Dinh et al., 2016

can transform the data into a space that is easier to model

Page 64: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

64

change of variables for variational inference: normalizing flows

use more complex approximate posterior, but evaluate a simpler distribution

chain together multiple transforms to get more expressive model

Variational Inference with Normalizing Flows, Rezende & Mohamed, 2015

target distribution:

transforms

Normalizing Flows Tutorial, Eric Jang, 2018

transform q(z|x)

Page 65: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

65

transforms

additive coupling layer Dinh et al., 2014

yd+1:D = xd+1:D + f(x1:d)y1:d = x1:d

planar flow Rezende & Mohamed, 2015

y = x+ f(x)� g(h(x)|x+ b(x))

affine coupling layer Dinh et al., 2016

yd+1:D = xd+1:D � exp(f(x1:d)) + g(x1:d)

y1:d = x1:d

inverse auto-regressive flow (IAF) Kingma et al., 2016

y =x� f(x)

exp(g(x))

masked auto-regressive flow (MAF) Papamakarios et al., 2017

y = x� exp(g(x)) + f(x)

Page 66: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

66

NICE: Non-linear Independent Components EstimationDinh et al., 2014

Variational Inference with Normalizing FlowsRezende & Mohamed, 2015

Density Estimation Using Real NVPDinh et al., 2016

Improving Variational Inference with Inverse Autoregressive FlowKingma et al., 2016

recent work

Page 67: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

67

recap: explicit latent variable models

difficult to capture details

requires additional assumptions on latent variables

x

z

N

model the data through latent variables

Pros Cons

can capture abstract variables, good for semi supervised learning

relatively fast sampling / training

theoretical foundations from info. theory

likelihood evaluation / inference often intractable

Page 68: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

I M P L I C I T L AT E N T VA R I A B L E M O D E L S

Page 69: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

69

instead of using an explicit probability density, learn a model that defines an implicit density

p(x̂)

x

specify a stochastic procedure for generating the data that does not require an explicit likelihood evaluation

Learning in Implicit Generative Models, Mohamed & Lakshminarayanan, 2016

p(x̃)

Page 70: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

70

Generative Stochastic Networks (GSNs)

train an auto-encoder to learn Monte Carlo sampling transitions

the generative distribution is implicitly defined by this transition

Deep Generative Stochastic Networks Trainable by Backprop, Bengio et al., 2013

Page 71: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

71

estimate density ratio through Bayesian two-sample test

p(x̃)p(x̂)

x

p(x̂)data distribution p(x̃)generated distribution

p(x̂)

p(x̃)=

p(x|data)p(x|gen.)

density estimation becomes a sample discrimination task

p(x̂)

p(x̃)=

p(data|x)p(x)/p(data)p(gen.|x)p(x)/p(gen.) (Bayes’ rule)

p(x̂)

p(x̃)=

p(data|x)p(gen.|x) (assuming equal dist. prob.)

Page 72: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

72

Generative Adversarial Networks (GANs)

learn the discriminator:

p(data|x) = D(x) p(gen.|x) = 1�D(x)

Bernoulli outcome:

log p(y|x) = logD(x̂) + log(1�D(x̃))

y 2 {data, gen.}

Mohamed, 2016

Goodfellow, 2016

two-sample criterion:

minG

maxD

Ep(x̂) [logD(x̂)] + Ep(x̃) [log(1�D(x̃))]

x̂ ⇠ p(x̂)

x̃ ⇠ p(x̃)

Page 73: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

73

Generative Adversarial Networks (GANs)

x̂ ⇠ p(x̂)

x̃ ⇠ p(x̃)

two-sample criterion:

minG

maxD

Ep(x̂) [logD(x̂)] + Ep(x̃) [log(1�D(x̃))]

in practice:maxD

Ep(x̂) [logD(x̂)] + Ep(x̃) [log(1�D(x̃))]

maxG

Ep(x̃) [logD(x̃)]

Generative Adversarial Networks, Goodfellow et al., 2014Goodfellow, 2016

Page 74: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

74

interpretation

data manifold

Aaron Courville

explicit model

explicit models tend to cover the entire data manifold, but are constrained

implicit model

implicit models tend to capture part of the data manifold, but can neglect other parts

“mode collapse”

Page 75: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

75

Generative Adversarial Networks (GANs)

GANs can be difficult to optimize

Improved Training of Wasserstein GANs, Gulrajani et al., 2017

Page 76: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

76

evaluation

inception score

A Note on the Inception Score, Barratt & Sharma, 2018

without an explicit likelihood, it is difficult to quantify the performance

use a pre-trained Inception v3 model to quantify class and distribution entropy

IS(G) = exp�Ep(x̃)DKL(p(y|x̃)||p(y))

p(y|x̃) is the class distribution for a given image

p(y) =

Zp(y|x̃)dx̃ is the marginal class distribution

should be highly peaked (low entropy)

want this to be uniform (high entropy)

Improved Techniques for Training GANs, Salimans et al., 2016

Page 77: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

77

extensions: Wasserstein GAN

under an “ideal” discriminator, the generator minimizes the Jensen-Shannon divergence

DJS(p(x̂)||p(x̃)) =1

2DKL(p(x̂)||

1

2(p(x̂) + p(x̃))) +

1

2DKL(p(x̃)||

1

2(p(x̂) + p(x̃)))

however, this metric can be discontinuous, making it difficult to train

can instead use the Wasserstein (Earth Mover’s) distance (which is continuous and diff. almost everywhere):

W (p(x̂), p(x̃)) = inf�2⇧(p(x̂),p(x̃))

E(x̂,x̃)⇠� [||x̂� x̃||]

think of it as the “minimum cost of transporting points between two distributions”

intractable to actually evaluate Wasserstein distance, but by constraining the discriminator, can evaluate

minG

maxD2D

Ep(x̂) [D(x̂)]� Ep(x̃) [D(x̃)]

is the set of Lipschitz functions, which can be enforced through weight clipping or gradient penaltyD

Was

sers

tein

Jens

en-S

hann

on

✓ is a gen. model parameter

Wasserstein GANs, Arjovsky et al., 2017

Improved Training of Wasserstein GANs, Gulrajani et al., 2017

Page 78: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

78

extensions: inference

Adversarially Learned Inference, Dumoulin et al., 2017

Adversarial Feature Learning, Donahue et al., 2017

can we also learn to infer a latent representation?

Page 79: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

79

applicationsimage to image translation

Image-to-Image Translation with Conditional Adversarial Networks, Isola et al., 2016

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Zhu et al., 2017

experimental simulation

Learning Particle Physics by Example, de Oliveira et al., 2017

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative

Adversarial Nets, Chen et al., 2016

interpretable representationstext to image synthesis

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, Zhang et al., 2016

music synthesis

MIDINET: A CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORK FOR SYMBOLIC-DOMAIN MUSIC GENERATION, Yang et al., 2017

Page 80: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

80

recap: implicit latent variable models

x̂ ⇠ p(x̂)

x̃ ⇠ p(x̃)

Pros

able to learn flexible models

requires fewer modeling assumptions

capable of learning latent representation

Cons

difficult to evaluate

sensitive, difficult to optimize

can be difficult to incorporate model assumptions

Page 81: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

D I S C U S S I O N

Page 82: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

82

generative models: what are they good for?

generative models model the data distribution

1. can generate and simulate data

2. can extract structure from data

Page 83: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord

83

generative models: what’s next?

applying generative models to new forms of data

incorporating generative models into complementary learning systems

Page 84: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord