Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Disentangling Disentanglement inVariational AutoencodersICML 2019
Emile Mathieu?, Tom Rainforth?, N. Siddharth?, Yee Whye TehJune 12, 2019
Departments of Statistics and Engineering Science, University of Oxford
Variational Autoencoders
x1
x2
x3
x4
x5
x
z1
z2
z3
z4
GenerativeModel Inference
Modelzl(gender)
zm (beard)zn
(makeup)
Factors
1
Disentanglement
= Independence
x1
x2
x3
x4
x5
x
z1
z2
z3
z4
GenerativeModel Inference
Model
xixj
zl(gender)
zm (beard)zn
(makeup)
MeaningfulFactors
1
Disentanglement = Independence
x1
x2
x3
x4
x5
x
z1
z2
z3
z4
GenerativeModel Inference
Modelzl(shape)
zm (angle)zn
(scale)
IndependentFactors
1
Decomposition ∈ {Independence, Clustering, Sparsity, …}
x1
x2
x3
x4
x5
x
z1
z2
z3
z4
GenerativeModel Inference
Modelzl(gender)
zm (beard)zn
(makeup)
Co-RelatedFactors
1
Decomposition: A Generalization of Disentanglement
Characterise decomposition as the fulfilment of two factors:
(a) level of overlap between encodings in the latent space,(b) matching between the marginal posterior qφ(z) and structured
prior p(z) to constrain with the required decomposition.
2
Decomposition: An Analysis
Desired StructureTargetStructure
p(z)
3
Decomposition: An Analysis
Insufficient Overlap
Insu�cient
Overlap
q�(z|x) p✓(x|z)pD(x) q�(z) p(z) p✓(x)
3
Decomposition: An Analysis
Too Much Overlap
q�(z|x) p✓(x|z)
Too Much
Overlap
pD(x) q�(z) p(z) p✓(x)
3
Decomposition: An Analysis
Appropriate Overlap
Appropriate
Overlap
q�(z|x) p✓(x|z)pD(x) q�(z) p(z) p✓(x)
3
Overlap — Deconstructing the β-VAE
Lβ(x) = Eqφ(z|x)[logpθ(x|z)]− β · KL(qφ(z|x)||p(z))= L(x) (πθ,β ,qφ)︸ ︷︷ ︸
ELBO with β-annealed prior
+(β − 1) · Hqφ︸ ︷︷ ︸maxent
+ log Fβ︸ ︷︷ ︸constant
Implicationsβ-VAE disentangles largely by controlling the level of overlapIt places no direct pressure on the latents to be independent!
4
Decomposition: Objective
Lα,β(x) = Eqφ(z|x)[logpθ(x | z)] Reconstruct observations
− β · KL(qφ(z | x) ‖ p(z)) Control level of overlap
− α · D(qφ(z),p(z)) Impose desired structure
5
Decomposition: Generalising Disentanglement
Independence: p(z) = N (0,σ?)
Figure 1: β-VAE trained on 2D Shapes1 computing disentanglement2.
1Matthey et al., dSprites: Disentanglement testing Sprites dataset, p. 1.2Kim and Mnih, “Disentangling by Factorising”, p. 2.
6
Decomposition: Generalising Disentanglement
Clustering: p(z) = ∑k ρk · N (µk,σk)
β = 0.01 β = 0.5 β = 1.0 β = 1.2
α=0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
β=0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
α = 1 α = 3 α = 5 α = 8
Figure 2: Density of aggregate posterior qφ(z) with different α, β for thepinwheel dataset.3
3http://hips.seas.harvard.edu/content/synthetic-pinwheel-data-matlab. 7
Decomposition: Generalising Disentanglement
Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)
0 5 10 15 20 25 30 35 40 45Latent dimension
0.0
0.2
0.4
0.6
Avg.
late
nt m
agni
tude
TrouserDressShirt
Figure 3: Sparsity of learnt representations for the Fashion-MNIST4 dataset.
4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
8
Decomposition: Generalising Disentanglement
Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)
(a) d = 49 (b) d = 30 (c) d = 19 (d) d = 40leg separation dress width shirt fit sleeve style
Figure 3: Latent space traversals for “active” dimensions4.
4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
8
Decomposition: Generalising Disentanglement
Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)
0 200 400 600 800 1000alpha
0.2
0.3
0.4
0.5Av
g. N
orm
alise
d Sp
arsit
y
γ= 0, β= 0.1γ= 0.8, β= 0.1
γ= 0, β= 1γ= 0.8, β= 1
γ= 0, β= 5γ= 0.8, β= 5
Figure 3: Sparsity vs regularisation strength α (higher better)4.
4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
8
Recap
We propose and develop:
• Decomposition: a generalisation of disentanglement involving:(a) overlap of latent encodings(b) match between qφ(z) and p(z)
• A theoretical analysis of the β-VAE objective showing it primarilyonly contributes to overlap.
• An objective that incorporates both factors (a) and (b).• Experiments that showcase efficacy at different decompositions:• independence • clustering • sparsity
9
Emile Mathieu Tom Rainforth N. Siddharth Yee Whye Teh
Code Paper
iffsid/disentangling-disentanglementarXiv:1812.02833
Come talk to us at our poster: #59