41
Coulomb Autoencoders Emanuele Sansone, Hafiz Tiomoko Ali, Sun Jiacheng Huawei Noah’s Ark Lab (London)

Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

Coulomb AutoencodersEmanuele Sansone, Hafiz Tiomoko Ali, Sun Jiacheng Huawei Noah’s Ark Lab (London)

Page 2: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD. Page 3

• Motivation • Background on Autoencoders • The Problem of Local Minima • Generalization Analysis • Conclusions

Contents

Page 3: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Motivation

BigGANs [Brock et al. 2019]

Improvement of deep generative models (GANs, Flow, Autoregressive, VAEs) in recent years Lack of theoretical understanding: 1. Training (i.e. convergence guarantees to optimal solutions) 2. Generalization (i.e. quality of solutions with finite number of samples)

Page 4: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD. Page 6

• Motivation • Background on Autoencoders• The Problem of Local Minima • Generalization Analysis • Conclusions

Page 5: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - I

𝑓𝜃(𝑥) ∼ 𝑞𝑍(𝑧 ; 𝜃 ) 𝑧 ∼ 𝑝𝑍(𝑧)

𝑥 ∼ 𝑝𝑋(𝑥) 𝑔𝛾(𝑧) ∼ 𝑞𝑋(𝑥)

Goal Implicitly learning the unknown density 𝑝𝑋(𝑥)

Page 6: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - I

𝑓𝜃(𝑥) ∼ 𝑞𝑍(𝑧 ; 𝜃 ) 𝑧 ∼ 𝑝𝑍(𝑧)

𝑥 ∼ 𝑝𝑋(𝑥) 𝑔𝛾(𝑧) ∼ 𝑞𝑋(𝑥)

Goal Implicitly learning the unknown density

Problem formulation In order to ensure that , we need: 1. Left-invertibility on the support of 2. Density matching

𝑝𝑋(𝑥)

𝑝𝑋(𝑥) = 𝑞𝑋(𝑥)𝑥 = 𝑔𝛾(𝑓𝜃(𝑥)) 𝑝𝑋(𝑥)

𝑞𝑍(𝑧; 𝜃) = 𝑝𝑍(𝑧)

Page 7: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - I

𝑓𝜃(𝑥) ∼ 𝑞𝑍(𝑧 ; 𝜃 ) 𝑧 ∼ 𝑝𝑍(𝑧)

𝑥 ∼ 𝑝𝑋(𝑥) 𝑔𝛾(𝑧) ∼ 𝑞𝑋(𝑥)

Goal Implicitly learning the unknown density

Problem formulation In order to ensure that , we need: 1. Left-invertibility on the support of 2. Density matching

Objective:

𝑝𝑋(𝑥)

𝑝𝑋(𝑥) = 𝑞𝑋(𝑥)𝑥 = 𝑔𝛾(𝑓𝜃(𝑥)) 𝑝𝑋(𝑥)

𝑞𝑍(𝑧; 𝜃) = 𝑝𝑍(𝑧)

ℒ(θ, γ) = 𝑅𝐸𝐶(gγ ∘ fθ) + λ𝐷(qZ, pZ)

Page 8: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - II

ℒ(θ, γ) = 𝑅𝐸𝐶(gγ ∘ fθ) + λ𝐷(qZ, pZ)

𝑓𝜃(𝑥) ∼ 𝑞𝑍(𝑧 ; 𝜃 ) 𝑧 ∼ 𝑝𝑍(𝑧)

𝑥 ∼ 𝑝𝑋(𝑥) 𝑔𝛾(𝑧) ∼ 𝑞𝑋(𝑥)

Page 9: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - II

Properties 1. is typically the L2 loss, which is convex

2. has many forms, all of them are non-convex • Kullback-Leibler Divergence (KL) in Variational Autoencoders [Kingma and Welling 2014] [Rezende et al. 2014] • Maximum-Mean Discrepancy (MMD) in Generative Moment Matching Networks [Li et al. 2015] Wasserstein (WAE) [Tolstikhin et al. 2018] Coulomb Autoencoders (CouAEs)

ℒ(θ, γ) = 𝑅𝐸𝐶(gγ ∘ fθ) + λ𝐷(qZ, pZ)

𝑅𝐸𝐶(gγ ∘ fθ)𝐷(qZ, pZ)

𝑓𝜃(𝑥) ∼ 𝑞𝑍(𝑧 ; 𝜃 ) 𝑧 ∼ 𝑝𝑍(𝑧)

𝑥 ∼ 𝑝𝑋(𝑥) 𝑔𝛾(𝑧) ∼ 𝑞𝑋(𝑥)

Page 10: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - III

Why MMD should be preferred over KL? 1. KL term is not a proper metric, while MMD is an integral probability metric

Page 11: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - III

Why MMD should be preferred over KL? 1. KL term is not a proper metric, while MMD is an integral probability metric 2. KL is not always defined (e.g. empirical distributions, namely superposition of Dirac impulses)

Page 12: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - III

Why MMD should be preferred over KL? 1. KL term is not a proper metric, while MMD is an integral probability metric 2. KL is not always defined (e.g. empirical distributions, namely superposition of Dirac impulses) 3. Local minima (problem of reconstruction/posterior collapse, due to stochastic encoder)

𝑥1, 𝑥2 𝑔𝛾(𝑧1), 𝑔𝛾(𝑧2)

Page 13: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - III

Why MMD should be preferred over KL? 1. KL term is not a proper metric, while MMD is an integral probability metric 2. KL is not always defined (e.g. empirical distributions, namely superposition of Dirac impulses) 3. Local minima (problem of reconstruction/posterior collapse, due to stochastic encoder)

𝑥1, 𝑥2 𝑔𝛾(𝑧1), 𝑔𝛾(𝑧2)

𝑞 (𝑧 |𝑥1) 𝑞 (𝑧 |𝑥2)

Page 14: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - III

Why MMD should be preferred over KL? 1. KL term is not a proper metric, while MMD is an integral probability metric 2. KL is not always defined (e.g. empirical distributions, namely superposition of Dirac impulses) 3. Local minima (problem of reconstruction/posterior collapse, due to stochastic encoder)

𝑥1, 𝑥2 𝑔𝛾(𝑧1), 𝑔𝛾(𝑧2)

𝑞 (𝑧 |𝑥1) 𝑞 (𝑧 |𝑥2)

X

Page 15: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Background on Autoencoders - III

Why MMD should be preferred over KL? 1. KL term is not a proper metric, while MMD is an integral probability metric 2. KL is not always defined (e.g. empirical distributions, namely superposition of Dirac impulses) 3. Local minima (problem of reconstruction/posterior collapse, due to stochastic encoder)

𝑥1, 𝑥2 𝑔𝛾(𝑧1), 𝑔𝛾(𝑧2)

𝑞 (𝑧 |𝑥1) 𝑞 (𝑧 |𝑥2)

𝑥1, 𝑥2 𝑔𝛾(𝑧1), 𝑔𝛾(𝑧2)

𝑞 (𝑧 |𝑥1) 𝑞 (𝑧 |𝑥2)

X V

Page 16: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD. Page 21

• Motivation • Background on Autoencoders • The Problem of Local Minima • Generalization Analysis • Conclusions

Page 17: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

𝑘 (𝑧 ,  𝑧 ′�){𝑧𝑖′�}𝑁

𝑖=1∼ 𝑝𝑍{𝑧𝑖}𝑁

𝑖=1∼ 𝑞𝑍

Properties of MMD

M𝑀𝐷({𝑧𝑖}𝑁𝑖=1

, {𝑧𝑖′�}𝑁𝑖=1) =

1𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖′�, 𝑧𝑗′�) +1

𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖, 𝑧𝑗) −2

𝑁 2

𝑁

∑𝑖=1

𝑁

∑𝑗=1

𝑘(𝑧𝑖′�, 𝑧𝑗)

Page 18: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

𝑘 (𝑧 ,  𝑧 ′�){𝑧𝑖′�}𝑁

𝑖=1∼ 𝑝𝑍{𝑧𝑖}𝑁

𝑖=1∼ 𝑞𝑍

Properties of MMD

Minimization of MMD wrt maximization of inter-similarity and minimization of intra-similarities Used for density matching or two sample test [Gretton et al. 2012], recently used in autoencoders [Tolstikhin et al. 2018]

{𝑧𝑖}𝑁𝑖=1

Intra-similarity Intra-similarity Inter-similarity

M𝑀𝐷({𝑧𝑖}𝑁𝑖=1

, {𝑧𝑖′�}𝑁𝑖=1) =

1𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖′�, 𝑧𝑗′�) +1

𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖, 𝑧𝑗) −2

𝑁 2

𝑁

∑𝑖=1

𝑁

∑𝑗=1

𝑘(𝑧𝑖′�, 𝑧𝑗)

Page 19: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

𝑘 (𝑧 ,  𝑧 ′�){𝑧𝑖′�}𝑁

𝑖=1∼ 𝑝𝑍{𝑧𝑖}𝑁

𝑖=1∼ 𝑞𝑍

Properties of MMD

Minimization of MMD wrt maximization of inter-similarity and minimization of intra-similarities Used for density matching or two sample test [Gretton et al. 2012], recently used in autoencoders [Tolstikhin et al. 2018]

{𝑧𝑖}𝑁𝑖=1

The choice of kernel function is related with the problem of local minima

Intra-similarity Intra-similarity Inter-similarity

M𝑀𝐷({𝑧𝑖}𝑁𝑖=1

, {𝑧𝑖′�}𝑁𝑖=1) =

1𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖′�, 𝑧𝑗′�) +1

𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖, 𝑧𝑗) −2

𝑁 2

𝑁

∑𝑖=1

𝑁

∑𝑗=1

𝑘(𝑧𝑖′�, 𝑧𝑗)

Page 20: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

𝑘 (𝑧 ,  𝑧 ′�){𝑧𝑖′�}𝑁

𝑖=1∼ 𝑝𝑍{𝑧𝑖}𝑁

𝑖=1∼ 𝑞𝑍

Properties of MMD and Coulomb kernel

𝑘(𝑧, 𝑧′�) =1

𝑧 − 𝑧′� h−2 𝑁 > h > 2Coulomb kernel

Intra-similarity Intra-similarity Inter-similarity

M𝑀𝐷({𝑧𝑖}𝑁𝑖=1

, {𝑧𝑖′�}𝑁𝑖=1) =

1𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖′�, 𝑧𝑗′�) +1

𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖, 𝑧𝑗) −2

𝑁 2

𝑁

∑𝑖=1

𝑁

∑𝑗=1

𝑘(𝑧𝑖′�, 𝑧𝑗)

Page 21: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

𝑘 (𝑧 ,  𝑧 ′�){𝑧𝑖′�}𝑁

𝑖=1∼ 𝑝𝑍{𝑧𝑖}𝑁

𝑖=1∼ 𝑞𝑍

Properties of MMD and Coulomb kernel

𝑘(𝑧, 𝑧′�) =1

𝑧 − 𝑧′� h−2 𝑁 > h > 2Coulomb kernel

Theorem Minimization of MMD wrt 1. All local extrema are global 2. The set of saddle points has measure zero

{𝑧𝑖}𝑁𝑖=1

Intra-similarity Intra-similarity Inter-similarity

M𝑀𝐷({𝑧𝑖}𝑁𝑖=1

, {𝑧𝑖′�}𝑁𝑖=1) =

1𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖′�, 𝑧𝑗′�) +1

𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖, 𝑧𝑗) −2

𝑁 2

𝑁

∑𝑖=1

𝑁

∑𝑗=1

𝑘(𝑧𝑖′�, 𝑧𝑗)

Page 22: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

𝑘 (𝑧 ,  𝑧 ′�){𝑧𝑖′�}𝑁

𝑖=1∼ 𝑝𝑍{𝑧𝑖}𝑁

𝑖=1∼ 𝑞𝑍

Properties of MMD and Coulomb kernel

𝑘(𝑧, 𝑧′�) =1

𝑧 − 𝑧′� h−2 𝑁 > h > 2Coulomb kernel

Theorem Minimization of MMD wrt 1. All local extrema are global 2. The set of saddle points has measure zero

{𝑧𝑖}𝑁𝑖=1

Remark: Convergence to global minimum when optimized through local-search methods!

Intra-similarity Intra-similarity Inter-similarity

M𝑀𝐷({𝑧𝑖}𝑁𝑖=1

, {𝑧𝑖′�}𝑁𝑖=1) =

1𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖′�, 𝑧𝑗′�) +1

𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖, 𝑧𝑗) −2

𝑁 2

𝑁

∑𝑖=1

𝑁

∑𝑗=1

𝑘(𝑧𝑖′�, 𝑧𝑗)

Page 23: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

𝑘 (𝑧 ,  𝑧 ′�){𝑧𝑖′�}𝑁

𝑖=1∼ 𝑝𝑍{𝑧𝑖}𝑁

𝑖=1∼ 𝑞𝑍

Properties of MMD and Coulomb kernel

𝑘(𝑧, 𝑧′�) =1

𝑧 − 𝑧′� h−2 𝑁 > h > 2Coulomb kernel

Theorem Minimization of MMD wrt 1. All local extrema are global 2. The set of saddle points has measure zero

{𝑧𝑖}𝑁𝑖=1

Remark: Convergence to global minimum when optimized through local-search methods!

Ground truth

Gaussian kernel

Coulomb kernel

Intra-similarity Intra-similarity Inter-similarity

M𝑀𝐷({𝑧𝑖}𝑁𝑖=1

, {𝑧𝑖′�}𝑁𝑖=1) =

1𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖′�, 𝑧𝑗′�) +1

𝑁(𝑁 − 1)

𝑁

∑𝑖=1

∑𝑗≠𝑖

𝑘(𝑧𝑖, 𝑧𝑗) −2

𝑁 2

𝑁

∑𝑖=1

𝑁

∑𝑗=1

𝑘(𝑧𝑖′�, 𝑧𝑗)

Page 24: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Experiments

Page 25: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Experiments

VAE WAE CouAEGround truth

Page 26: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Experiments

VAE (KL) WAE (MMD) CouAE (MMD + Coulomb)Ground truth

Page 27: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Experiments

VAE (KL) WAE (MMD) CouAE (MMD + Coulomb)Ground truth

Page 28: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Experiments

VAE (KL) WAE (MMD) CouAE (MMD + Coulomb)Ground truth

VAE (KL) WAE (MMD) CouAE (MMD + Coulomb)

Page 29: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD. Page 45

• Motivation • Background on Autoencoders • The Problem of Local Minima • Generalization Analysis • Conclusions

Page 30: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Generalization Analysis

(finite number of samples) (infinite number of samples)

ℒ̂ = ^𝑅𝐸𝐶 + 𝜆 ^𝑀𝑀𝐷ℒ = 𝑅𝐸𝐶 + 𝜆𝑀𝑀𝐷

?

?

0 0

ℒℒ̂

Page 31: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Generalization Analysis

(finite number of samples) (infinite number of samples)

ℒ̂ = ^𝑅𝐸𝐶 + 𝜆 ^𝑀𝑀𝐷ℒ = 𝑅𝐸𝐶 + 𝜆𝑀𝑀𝐷

Theorem

For any

0 ≤ 𝑘(𝑧, 𝑧′ �) = 1/( 𝑧 − 𝑧′�h−2

+ 𝜖) ≤ 𝐾0 ≤ 𝑅𝐸𝐶 ≤ 𝜉

𝑠,  𝑡 > 0

Pr{ ℒ̂ − ℒ > 𝑡 + 𝜆𝑠} ≤ 2𝑒𝑥𝑝{−2𝑁𝑡2

𝜉2 } + 6𝑒𝑥𝑝{−2⌊𝑁/2⌋𝑠2

9𝐾2 }

ℒℒ̂

?

?

0 0

Page 32: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Generalization Analysis

(finite number of samples) (infinite number of samples)

ℒ̂ = ^𝑅𝐸𝐶 + 𝜆 ^𝑀𝑀𝐷ℒ = 𝑅𝐸𝐶 + 𝜆𝑀𝑀𝐷

Theorem

For any

0 ≤ 𝑘(𝑧, 𝑧′ �) = 1/( 𝑧 − 𝑧′�h−2

+ 𝜖) ≤ 𝐾0 ≤ 𝑅𝐸𝐶 ≤ 𝜉

𝑠,  𝑡 > 0

Pr{ ℒ̂ − ℒ > 𝑡 + 𝜆𝑠} ≤ 2𝑒𝑥𝑝{−2𝑁𝑡2

𝜉2 } + 6𝑒𝑥𝑝{−2⌊𝑁/2⌋𝑠2

9𝐾2 }

Contribution of recon. error

Contribution of MMD

(finite number of samples) (infinite number of samples)

ℒ̂ = ^𝑅𝐸𝐶 + 𝜆 ^𝑀𝑀𝐷ℒ = 𝑅𝐸𝐶 + 𝜆𝑀𝑀𝐷

ℒℒ̂

?

?

0 0

Page 33: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Generalization Analysis

(finite number of samples) (infinite number of samples)

ℒ̂ = ^𝑅𝐸𝐶 + 𝜆 ^𝑀𝑀𝐷ℒ = 𝑅𝐸𝐶 + 𝜆𝑀𝑀𝐷

Contribution of recon. error

Contribution of MMD

How can we make small? 1. Estimation of -> maximum reconstruction error on both

training and validation data 2. Minimization of -> Finding proper network architecture (e.g.

layer width, networks’ depth, residual connections)

𝝃 𝜉

 𝜉 

(finite number of samples) (infinite number of samples)

ℒ̂ = ^𝑅𝐸𝐶 + 𝜆 ^𝑀𝑀𝐷ℒ = 𝑅𝐸𝐶 + 𝜆𝑀𝑀𝐷

Theorem

For any

0 ≤ 𝑘(𝑧, 𝑧′ �) = 1/( 𝑧 − 𝑧′�h−2

+ 𝜖) ≤ 𝐾0 ≤ 𝑅𝐸𝐶 ≤ 𝜉

𝑠,  𝑡 > 0

Pr{ ℒ̂ − ℒ > 𝑡 + 𝜆𝑠} ≤ 2𝑒𝑥𝑝{−2𝑁𝑡2

𝜉2 } + 6𝑒𝑥𝑝{−2⌊𝑁/2⌋𝑠2

9𝐾2 }

ℒℒ̂

?

?

0 0

Page 34: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

ExperimentsControlling by changing total number of hidden neurons (capacity)𝜉

Page 35: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

ExperimentsControlling by changing total number of hidden neurons (capacity)𝜉

Page 36: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

ExperimentsControlling by changing total number of hidden neurons (capacity)𝜉

x0.25 x0.5 x1

Page 37: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

ExperimentsControlling by changing total number of hidden neurons (capacity)𝜉

x0.25 x0.5 x1

Remarks 1. Network architecture is fundamental to control generalization 2. Increasing capacity (the number of hidden neurons) leads to better generalization (as

long as is decreased) 3. Other architectural choices (e.g. depth, residual connections) may further decrease

𝜉𝜉

Page 38: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

ExperimentsControlling by changing total number of hidden neurons (capacity)𝜉

x0.25 x0.5 x1

Remarks 1. Network architecture is fundamental to control generalization 2. Increasing capacity (the number of hidden neurons) leads to better generalization (as

long as is decreased) 3. Other architectural choices (e.g. depth, residual connections) may further decrease

𝜉𝜉

Open Question What is/are the optimal network architecture/s minimizing ?𝜉

Page 39: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD. Page 63

• Motivation • Background on Autoencoders • The Problem of Local Minima • Generalization Analysis • Conclusions

Page 40: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

HUAWEI TECHNOLOGIES CO., LTD.

Conclusions

1. Problem of local minima, MMD + Coulomb kernel behaves similarly to a convex functional

2. Generalization analysis, probabilistic bound giving insights on possible directions to improve autoencoder in principled manner

Page 41: Coulomb Autoencoders · 2020. 8. 3. · • Generalization Analysis • Conclusions Contents. HUAWEI TECHNOLOGIES CO., LTD. Motivation BigGANs [Brock et al. 2019] Improvement of deep

24th May 2019

HUAWEI TECHNOLOGIES CO., LTD.

www.huawei.com

Thank You