53
Challenge: Learning Large Encoder Hao Dong Peking University 1

Challenge: Learning Large Encoder

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Challenge: Learning Large Encoder

Challenge:Learning Large Encoder

Hao Dong

Peking University

1

Page 2: Challenge: Learning Large Encoder

2

Challenge: Learning Large Encoder

!𝑿Z

Previous Lecture: Large Image This Lecture: Large Encoder

!𝒁X

We use images for demonstration

Unsupervised Representation Learning!

Scalable Reversable

Page 3: Challenge: Learning Large Encoder

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

3

Challenge: Learning Large Encoder

Page 4: Challenge: Learning Large Encoder

4

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 5: Challenge: Learning Large Encoder

5

VAE vs. GAN

!𝑿!𝒁X

L2

KLD

!𝑿

X

real

fakeZ

Vanilla GAN VAE variational autoencoder

VAE has an Encoder that can map x to z

Z à XZ à XX à Z

Page 6: Challenge: Learning Large Encoder

6

VAE vs. GAN

Image SpaceLatent Space

Image SpaceLatent Space

Vanilla GAN

VAE

• VAE = Generator + Encoder

• Vanilla GAN = Generator + Discriminator

• Better GAN = Generator + Discriminator + Encoder

Z X

Page 7: Challenge: Learning Large Encoder

7

VAE vs. GAN

Input vector 𝒛𝒂 Input vector 𝒛𝒃

Interpolation

• Without Encoder:

• With Encoder:

𝐺(𝑧#) 𝐺(𝑧$)

Input image 𝒙𝒃

𝐺( 1 − 𝛼 ∗ 𝑧#+ 𝛼 ∗ 𝑧$),𝛼 ∈ [0, 1]

Input image 𝒙𝒂

𝑧# =E(𝑥#) 𝑧$ =E(𝑥$)

𝐺( 1 − 𝛼 ∗ 𝑧#+ 𝛼 ∗ 𝑧$),𝛼 ∈ [0, 1]

Page 8: Challenge: Learning Large Encoder

8

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 9: Challenge: Learning Large Encoder

9

A Naïve Approach

!𝑿

X

real

fakeZ

Step 1: Pre-trained G Step 2: Fix G and Train E

Z !𝑿 !𝒁

L2

Fixed

X à Z

Page 10: Challenge: Learning Large Encoder

10

A Naïve Approach

!𝑿

X

real

fakeZ

Cc=1

c=2

Given an ACGAN Learning the Encoder in a Brute Force Way

Z

C

!𝑿 !𝒁

L2

Fixed

C=1 C=2 X à Z

• Application: Unsupervised/Unpaired Image-to-Image Translation

Unsupervised Image-to-Image Translation with Generative Adversarial Networks. H. Dong, P. Neekhara et al. arXiv 2017.

Z : shared latent representation across two domains

Page 11: Challenge: Learning Large Encoder

11

A Naïve Approach

• Application: Unsupervised/Unpaired Image-to-Image Translation

X c=1 !𝒁

C=2

!𝑿 c=2

Shared Features Across Domain

Unsupervised Image-to-Image Translation with Generative Adversarial Networks. H. Dong, P. Neekhara et al. arXiv 2017.

Page 12: Challenge: Learning Large Encoder

12

A Naïve Approach

• Application: Unsupervised/Unpaired Image-to-Image Translation

X c=1 !𝒁

C=2

!𝑿 c=2

Shared Features Across Domain

Only Work Well for Simple Image with Small Size

Page 13: Challenge: Learning Large Encoder

13

A Naïve Approach

• Limitation: Encoder never see real data sample !

Z

C

!𝑿 !𝒁

L2

E never see real X!𝑿 ≠X

13

A Naïve Approach

• Limitation: Encoder never see real data sample !

Z

C

!" !#

L2

E never see real X!" ≠X

Page 14: Challenge: Learning Large Encoder

14

A Naïve Approach

• Limitation: Encoder never see real data sample and the synthesized data distribution != real data distribution

• Mode Collapse

𝑧𝑧𝑧𝑧𝑧𝑧

𝑥𝑥𝑥𝑥𝑥𝑥

𝑧𝑧𝑧𝑧𝑧𝑧

𝑥𝑥𝑥𝑥𝑥𝑥

G can only synthesis some part of the dataset xand can fool D

G can even only synthesis one dataand can fool D Examples of GAN collapse

Page 15: Challenge: Learning Large Encoder

15

A Naïve Approach

• Only work well if only if the fake distribution == the real distribution,but it is impossible in practice.

!𝑿

X

real

fakeZ

Step 1: Pre-trained G Step 2: Fix G and Train E

Z !𝑿 !𝒁

L2

Fixed

X à Z

Page 16: Challenge: Learning Large Encoder

16

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 17: Challenge: Learning Large Encoder

17

Another Naïve Approach

!𝑿

X

real

fakeZ

Step 1: Pre-trained G Step 2: Fix G and Train E

Z !𝑿 !𝒁

L2

Fixed

X à Z

• Could E see real data sample?

X

Page 18: Challenge: Learning Large Encoder

18

Another Naïve Approach

!𝑿

X

real

fakeZ

Step 1: Pre-trained G Step 2: Fix G and Train E

X !𝒁 !𝑿

L2

Fixed

X à Z

• Could E see real data sample?

E can see real data sample now

Page 19: Challenge: Learning Large Encoder

19

Another Naïve Approach

• Problem: Difficult to converge (even using a super-deep E)

• Reason:Model Collapse: G cannot synthesize the input image,so the loss cannot be reduced

The quality of synthesized images != real images, so the loss cannot be reduced

Page 20: Challenge: Learning Large Encoder

20

Another Naïve Approach

!𝑿

X

real

fakeZ

Step 1: Pre-trained G Step 2: Fix G and Train E

X !𝒁 !𝑿

L2

Fixed

X à Z

• Only work well if only if the fake distribution == the real distribution,but it is impossible in practice.

E can see real data sample now,but G cannot always generate the input samples

Page 21: Challenge: Learning Large Encoder

21

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 22: Challenge: Learning Large Encoder

22

Without Encoder

!𝑿

X

real

fakeZ

Step 1: Pre-trained G Step 2: Fix G and X, Train Z

!𝒁 !𝑿

Fixed

• Optimisation-based method: find the optimal z iteratively

𝑿L1

Page 23: Challenge: Learning Large Encoder

23

Without Encoder

• Limitation?

SLOWA naïve way to speed up this method is to:

use one of the previous naïve way to pretrain an encoder, then

step 1: use the encoder to initialize the latent code z when given an image xstep 2: find the optimal z iteratively

Page 24: Challenge: Learning Large Encoder

24

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 25: Challenge: Learning Large Encoder

25

Recap: Bidirectional GAN

BiGANBidirectional GAN

Z !𝑿

!𝒁X

realfake

BiGAN: Adversarial Feature Learning. Jeff Donahue, Philipp Krahenbuhl, Trevor Darrell. ICLR 2017.ALI: Adversarially Learned Inference. Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville. ICLR 2017.

𝑋, (𝑍 − { ,𝑋, 𝑍}

Page 26: Challenge: Learning Large Encoder

26

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 27: Challenge: Learning Large Encoder

27

Adversarial Autoencoder

AAEAdversarial Autoencoder

!𝑿!𝒁X

realfake

Z

L2

Dz

AAE: Adversarial Autoencoder. Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly. ICLR 2016.

(𝑍 − {𝑍}

Page 28: Challenge: Learning Large Encoder

28

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 29: Challenge: Learning Large Encoder

29

VAE+GAN

VAE+GAN

!𝑿!𝒁Xrealfake

KLD L2𝑿

Discriminator as the feature extractor

VAE-GAN: Autoencoding Beyond Pixels Using a Learned Similarity Metric. Andres Boesen Lindbo Larsen, Soren Kaae Sonderby, Hugo Larochelle, Ole Winther. ICML 2016.

,𝑋 − {𝑋}

Page 30: Challenge: Learning Large Encoder

30

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 31: Challenge: Learning Large Encoder

31

𝜶-GAN

!𝑿!𝒁X

realfake

Z

L2

realfake

𝛼-GAN

Dz

𝑿

• Training the G and E in Autoencoder way can force the G to be able to generate all X, avoiding GAN mode collapse

α-GAN: Variational Approaches for Auto-Encoding Generative Adversarial Networks. Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed. arXiv preprint arXiv:1706.04987.

,𝑋 − {𝑋}

(𝑍 − {𝑍}

Page 32: Challenge: Learning Large Encoder

32

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 33: Challenge: Learning Large Encoder

33

BigBiGAN

• Work on large images• Combine BigGAN and BiGAN

BigBiGAN: Large Scale Adversarial Representation Learning. Jeff Donahue, Karen Simonyan. NIPS 2019.

BiGANBidirectional GAN

Z !𝑿

!𝒁X

realfake

BigGAN

Page 34: Challenge: Learning Large Encoder

34

BigBiGAN

• Limitation

BigBiGAN: Large Scale Adversarial Representation Learning. Jeff Donahue, Karen Simonyan. NIPS 2019.

image size of 512x512x3 à Latent code with size of 1x512

512512×512×3

= 0.000651

Difficult to be lossless ….

Page 35: Challenge: Learning Large Encoder

35

BigBiGAN

• Limitation

BigBiGAN: Large Scale Adversarial Representation Learning. Jeff Donahue, Karen Simonyan. NIPS 2019.

Page 36: Challenge: Learning Large Encoder

36

BigBiGAN

• Limitation

BigBiGAN: Large Scale Adversarial Representation Learning. Jeff Donahue, Karen Simonyan. NIPS 2019.

Page 37: Challenge: Learning Large Encoder

37

BigBiGAN

• Main Goal: Large Scale Adversarial Representation Learning

BigBiGAN: Large Scale Adversarial Representation Learning. Jeff Donahue, Karen Simonyan. NIPS 2019.

Page 38: Challenge: Learning Large Encoder

38

BigBiGAN

• Main Goal: Large Scale Adversarial Representation Learning

BigBiGAN: Large Scale Adversarial Representation Learning. Jeff Donahue, Karen Simonyan. NIPS 2019.

Page 39: Challenge: Learning Large Encoder

39

BigBiGAN

• Summary

• A single latent code cannot represent a high-resolution image• Other information inside the generator• High compression rate

• Next: any solution?

BigBiGAN: Large Scale Adversarial Representation Learning. Jeff Donahue, Karen Simonyan. NIPS 2019.

Page 40: Challenge: Learning Large Encoder

40

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 41: Challenge: Learning Large Encoder

41

Multi-code GAN prior

• An Optimisation-based Method

Image Processing Using Multi-Code GAN Prior. Gu, Jinjin. Shen, Yujun. Zhou, Bolei. arXiv 2019.

A single latent code is not enough to recover all detailed information.We can use multiple latent codes to recover different feature maps.

Page 42: Challenge: Learning Large Encoder

42

Multi-code GAN prior

• Reconstruction

Image Processing Using Multi-Code GAN Prior. Gu, Jinjin. Shen, Yujun. Zhou, Bolei. arXiv 2019.

Page 43: Challenge: Learning Large Encoder

43

Multi-code GAN prior

• Inpainting

Image Processing Using Multi-Code GAN Prior. Gu, Jinjin. Shen, Yujun. Zhou, Bolei. arXiv 2019.

Page 44: Challenge: Learning Large Encoder

44

Multi-code GAN prior

• More

Image Processing Using Multi-Code GAN Prior. Gu, Jinjin. Shen, Yujun. Zhou, Bolei. arXiv 2019.

Page 45: Challenge: Learning Large Encoder

45

Multi-code GAN prior

• Discussion

• Why it works?

• Limitations?

Image Processing Using Multi-Code GAN Prior. Gu, Jinjin. Shen, Yujun. Zhou, Bolei. arXiv 2019.

Page 46: Challenge: Learning Large Encoder

46

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder • Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 47: Challenge: Learning Large Encoder

Implicit vs. Explicit Encoder

!𝑿𝐀

!𝒁

XArealfake

KLDXB !𝑿B

L1

L1

realfake

XA

XB

XA !𝑿B !𝑿𝐀

L1

realfake

realfake

XB !𝑿𝐀 !𝑿B

XB

XA

L1

CycleGANLearn the Encoder Implicitly

UNITLearn the Encoder Explicitly

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. J. Zhu, T. Park et al. ICCV 2017.Unsupervised image-to-image translation networks. M.Y. Liu, T. Breuel, J. Kautz. NIPS. 2017 47

XA !𝑿𝑨

L1

!𝑿B

XB !𝑿B

L1

!𝑿𝐀

Page 48: Challenge: Learning Large Encoder

Implicit vs. Explicit Encoder

CycleGANLearn the Encoder Implicitly

Liu et al.Learn the Encoder Explicitly

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. J. Zhu, T. Park et al. ICCV 2017.Unsupervised image-to-image translation networks. M.Y. Liu, T. Breuel, J. Kautz. NIPS. 2017 48

Page 49: Challenge: Learning Large Encoder

Implicit vs. Explicit Encoder

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. J. Zhu, T. Park et al. ICCV 2017.Unsupervised image-to-image translation networks. M.Y. Liu, T. Breuel, J. Kautz. NIPS. 2017 49

Page 50: Challenge: Learning Large Encoder

Implicit vs. Explicit Encoder

50

• Simple normal distribution is difficult to model complex images

• 3D tensors can contain more spatial information than vectors

• Many applications do not need interpolation

XA !𝑿B

• Image inpainting• Image super resolution• Image-to-image translation• ….

Page 51: Challenge: Learning Large Encoder

51

• VAE vs. GAN• A Naïve Approach• Another Naïve Approach• Without Encoder• Recap: BiGAN• Adversarial Autoencoder• VAE+GAN• 𝛼-GAN• BigBiGAN• Multi-code GAN prior• Implicit vs. Explicit Encoder• Summary

Page 52: Challenge: Learning Large Encoder

Summary

52

• GAN : G + D à G + D + E• Learning E from real data is important• GAN mode collapse• BiGAN, AAE, VAE+GAN, 𝛼-GAN, BigBiGAN• Autoencoder can help to avoid mode collapse• Learning E implicitly• The E can be extended to text and any other data type• Still on the way …

Page 53: Challenge: Learning Large Encoder

Thanks

53