Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement

Disentangling Disentanglement inVariational AutoencodersICML 2019

Emile Mathieu?, Tom Rainforth?, N. Siddharth?, Yee Whye TehJune 12, 2019

Departments of Statistics and Engineering Science, University of Oxford

Variational Autoencoders

x1

x2

x3

x4

x5

x

z1

z2

z3

z4

GenerativeModel Inference

Modelzl(gender)

zm (beard)zn

(makeup)

Factors

1

Disentanglement

= Independence

x1

x2

x3

x4

x5

x

z1

z2

z3

z4


Model

xixj

zl(gender)

zm (beard)zn

(makeup)

MeaningfulFactors

1

Disentanglement = Independence

x1

x2

x3

x4

x5

x

z1

z2

z3

z4


Modelzl(shape)

zm (angle)zn

(scale)

IndependentFactors

1

Decomposition ∈ {Independence, Clustering, Sparsity, …}

x1

x2

x3

x4

x5

x

z1

z2

z3

z4


Modelzl(gender)

zm (beard)zn

(makeup)

Co-RelatedFactors

1

Decomposition: A Generalization of Disentanglement

Characterise decomposition as the fulfilment of two factors:

(a) level of overlap between encodings in the latent space,(b) matching between the marginal posterior qφ(z) and structured

prior p(z) to constrain with the required decomposition.

2

Decomposition: An Analysis

Desired StructureTargetStructure

p(z)

3


Insufficient Overlap

Insu�cient

Overlap

q�(z|x) p✓(x|z)pD(x) q�(z) p(z) p✓(x)

3


Too Much Overlap

q�(z|x) p✓(x|z)

Too Much

Overlap

pD(x) q�(z) p(z) p✓(x)

3


Appropriate Overlap

Appropriate

Overlap

q�(z|x) p✓(x|z)pD(x) q�(z) p(z) p✓(x)

3

Overlap — Deconstructing the β-VAE

Lβ(x) = Eqφ(z|x)[logpθ(x|z)]− β · KL(qφ(z|x)||p(z))= L(x) (πθ,β ,qφ)︸︷︷︸

ELBO with β-annealed prior

+(β − 1) · Hqφ︸︷︷︸maxent

+ log Fβ︸︷︷︸constant

Implicationsβ-VAE disentangles largely by controlling the level of overlapIt places no direct pressure on the latents to be independent!

4

Decomposition: Objective

Lα,β(x) = Eqφ(z|x)[logpθ(x | z)] Reconstruct observations

− β · KL(qφ(z | x) ‖ p(z)) Control level of overlap

− α · D(qφ(z),p(z)) Impose desired structure

5

Decomposition: Generalising Disentanglement

Independence: p(z) = N (0,σ?)

Figure 1: β-VAE trained on 2D Shapes1 computing disentanglement2.

1Matthey et al., dSprites: Disentanglement testing Sprites dataset, p. 1.2Kim and Mnih, “Disentangling by Factorising”, p. 2.

6


Clustering: p(z) = ∑k ρk · N (µk,σk)

β = 0.01 β = 0.5 β = 1.0 β = 1.2

α=0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

β=0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

α = 1 α = 3 α = 5 α = 8

Figure 2: Density of aggregate posterior qφ(z) with different α, β for thepinwheel dataset.3

3http://hips.seas.harvard.edu/content/synthetic-pinwheel-data-matlab. 7

http://hips.seas.harvard.edu/content/synthetic-pinwheel-data-matlab


Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)

0 5 10 15 20 25 30 35 40 45Latent dimension

0.0

0.2

0.4

0.6

Avg.

late

nt m

agni

tude

TrouserDressShirt

Figure 3: Sparsity of learnt representations for the Fashion-MNIST4 dataset.

4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.

8



(a) d = 49 (b) d = 30 (c) d = 19 (d) d = 40leg separation dress width shirt fit sleeve style

Figure 3: Latent space traversals for “active” dimensions4.


8



0 200 400 600 800 1000alpha

0.2

0.3

0.4

0.5Av

g. N

orm

alise

d Sp

arsit

y

γ= 0, β= 0.1γ= 0.8, β= 0.1

γ= 0, β= 1γ= 0.8, β= 1

γ= 0, β= 5γ= 0.8, β= 5

Figure 3: Sparsity vs regularisation strength α (higher better)4.


8

Recap

We propose and develop:

• Decomposition: a generalisation of disentanglement involving:(a) overlap of latent encodings(b) match between qφ(z) and p(z)

• A theoretical analysis of the β-VAE objective showing it primarilyonly contributes to overlap.

• An objective that incorporates both factors (a) and (b).• Experiments that showcase efficacy at different decompositions:• independence • clustering • sparsity

9

Emile Mathieu Tom Rainforth N. Siddharth Yee Whye Teh

Code Paper

iffsid/disentangling-disentanglementarXiv:1812.02833

Come talk to us at our poster: #59

https://github.com/iffsid/disentangling-disentanglement

https://arxiv.org/abs/1812.02833

Documents

Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement