56
Conditional generation S ource S ource

Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 2: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Outline• General conditional GANs• BigGAN• Paired image-to-image translation• Unpaired image-to-image translation:

CycleGAN

Page 3: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Conditional generation• Suppose we want to condition the generation

of samples on discrete side information (label) !• How do we add ! to the basic GAN framework?

" #"(%)#("(%))%

Page 4: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Conditional generation• Suppose we want to condition the generation

of samples on discrete side information (label) !• How do we add ! to the basic GAN framework?

"# #(", !)

'(# ", ! , !)

! !'

Page 5: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Conditional generation• Example: simple network for generating

28 x 28 MNIST digits

M. Mirza and S. Osindero, Conditional Generative Adversarial Nets, arXiv 2014

Class label: 10-dim one-hot

vector

Noise vector

Generator

Figure source: F. Fleuret

Page 6: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Conditional generation• Example: simple network for generating

28 x 28 MNIST digits

M. Mirza and S. Osindero, Conditional Generative Adversarial Nets, arXiv 2014

Class label: 10-dim one-hot

vector

Noise vector

Discriminator

Figure source: F. Fleuret

Page 7: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Conditional generation• Example: simple network for generating

28 x 28 MNIST digits

M. Mirza and S. Osindero, Conditional Generative Adversarial Nets, arXiv 2014

Page 8: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Conditional generation• Another example: text-to-image synthesis

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, ICML 2016

LSTM text encoder Spatial replication, depth concatenation

Page 9: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Conditional generation• Another example: text-to-image synthesis

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, ICML 2016

Previously unseen captions (zero-shot

setting)

Captions seen in the training set

Page 10: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

BigGAN• Synthesize ImageNet images, conditioned on

class label, up to 512 x 512 resolution

A. Brock, J. Donahue, K. Simonyan, Large scale GAN training for high fidelity natural image synthesis, arXiv 2018

Page 11: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

BigGAN• Self-attention GAN to capture spatial structure

Page 12: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

BigGAN• Self-attention GAN to capture spatial structure• Spectral normalization for generator and discriminator• Conditioning the generator: conditional batch norm• Conditioning the discriminator: projection• 8x larger batch size, 50% more feature channels than

baseline• Hierarchical latent space: feed different “chunks” of noise

vector into multiple layers of the generator• Lots of other tricks (initialization, training, z sampling, etc.)• Training observed to be unstable, but good results are

achieved before collapse

Page 13: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

BigGAN• Results at 512 x 512 resolution

Page 14: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

BigGAN• Results at 512 x 512 resolution

Page 15: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

BigGAN

Nearest neighbors in pixel space

Nearest neighbors in

FC7 space

Page 16: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

BigGAN• Results at 512 x 512 resolution

Easy classes Difficult classes

Page 17: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation

P. Isola, J.-Y. Zhu, T. Zhou, A. Efros, Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017

Page 18: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Produce modified image ! conditioned on

input image " (note change of notation)• Generator receives ! as input• Discriminator receives an !, # pair and has to

decide whether it is real or fake

Page 19: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Generator architecture: U-Net

• Note: no ! used as input, transformation is basically deterministic

Page 20: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Generator architecture: U-Net

Figure source

Encode: convolution → BatchNorm → ReLU

Decode: transposed convolution → BatchNorm → ReLU

Page 21: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Generator architecture: U-Net

Effect of adding skip connections to the generator

Page 22: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Generator loss: GAN loss plus L1 reconstruction

penalty

!∗ = argmin"max#ℒ"$% !,, + . /&

0& − !(3&) '

Generated output ((*&) should be close to

ground truth target ,&

Page 23: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Generator loss: GAN loss plus L1 reconstruction

penalty

!∗ = argmin"max#ℒ"$% !,, + . /&

0& − !(3&) '

Page 24: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Discriminator: PatchGAN

• Given input image ! and second image ", decide whether " is a ground truth target or produced by the generator

Page 25: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Discriminator: PatchGAN

• Given input image ! and second image ", decide whether " is a ground truth target or produced by the generator

• Output is a 30 x 30 map where each value (0 to 1) represents the quality of the corresponding section of the output image

• Fully convolutional network, effective patch size canbe increased by increasing the depth

Figure source

Page 26: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation• Discriminator: PatchGAN

• Given input image ! and second image ", decide whether " is a ground truth target or produced by the generator

• Output is a 30 x 30 map where each value (0 to 1) represents the quality of the corresponding section of the output image

• Fully convolutional network, effective patch size can be increased by increasing the depth

Effect of discriminator patch size on generator output

Page 27: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Translating between maps and aerial photos

Page 28: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Translating between maps and aerial photos• Human study:

Page 29: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Semantic labels to scenes

Page 30: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Semantic labels to scenes• Evaluation: FCN score

• The higher the quality of the output, the better theFCN should do at recovering the original semantic labels

Page 31: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Scenes to semantic labels

Page 32: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Scenes to semantic labels• Accuracy is worse than that of regular FCNs

or generator with L1 loss

Page 33: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Semantic labels to facades

Page 34: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Day to night

Page 35: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• Edges to photos

Page 36: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Results• pix2pix demo

Page 37: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Image-to-image translation: Limitations• Visual quality could be improved• Requires !, # pairs for training• Does not model conditional distribution $(#|!), returns a single mode instead

Page 38: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Unpaired image-to-image translation• Given two unordered image collections ! and ",

learn to “translate” an image from one into the other and vice versa

J.-Y. Zhu, T. Park, P. Isola, A. Efros, Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks, ICCV 2017

Page 39: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Unpaired image-to-image translation• Given two unordered image collections ! and ",

learn to “translate” an image from one into the other and vice versa

J.-Y. Zhu, T. Park, P. Isola, A. Efros, Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks, ICCV 2017

Page 40: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN• Given: domains ! and "• Train two generators # and $ and two

discriminators %! and %"• # translates from $ to %, & translates from % to $• '! recognizes images from $, '" from %• We want &(#())) ≈ ) and #(&(,)) ≈ ,

Page 41: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Architecture• Generators:

• Discriminators: PatchGAN on 70 x 70 patches

Figure source

Page 42: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Loss• Requirements:

• ! translates from " to #, $ translates from # to "• %& recognizes images from ", %' from #• We want $(!())) ≈ ) and !($(,)) ≈ ,

• CycleGAN discriminator loss: LSGANℒ./0 %' = 23~56787(3) (%' , − 1); + 2=~56787(=) %' ! ) ;

ℒ./0 %& = 2=~56787(=) (%& ) − 1); + 23~56787(3) %& $ , ;

• CycleGAN generator loss:ℒ>?> !, $ = 2=~56787(=) %' ! ) − 1 ; + 23~56787(3) %& $ , − 1 ;

+ 2=~56787(=) $ ! ) − ) A + 23~56787(3) ! $ , − , A

Page 43: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN• Illustration of cycle consistency:

Page 44: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Results• Translation between maps and aerial photos

Page 45: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Results• Other pix2pix tasks

Page 46: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Results• Scene to labels and labels to scene

• Worse performance than pix2pix due to lack of paired training data

Page 47: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Results• Tasks for which paired data is unavailable

Page 48: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Results• Style transfer

Page 49: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Failure cases• Failure cases

Page 50: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Failure cases• Failure cases

Page 51: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

CycleGAN: Limitations• Cannot handle shape changes (e.g., dog to

cat)• Can get confused on images outside of the

training domains (e.g., horse with rider)• Cannot close the gap with paired translation

methods• Does not account for the fact that one

transformation direction may be morechallenging than the other

Page 52: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Multimodal image-to-image translation

J.Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, E. Shechtman, Toward Multimodal Image-to-Image Translation, NIPS 2017

Page 53: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Unsupervised image-to-image translation

M.-Y. Liu, T. Breuel, and J. Kautz, Unsupervised Image-to-Image Translation Networks, NIPS 2017

Page 54: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

M.-Y. Liu, T. Breuel, and J. Kautz, Unsupervised Image-to-Image Translation Networks, NIPS 2017

Unsupervised image-to-image translation

Page 55: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

M.-Y. Liu, T. Breuel, and J. Kautz, Unsupervised Image-to-Image Translation Networks, NIPS 2017

Unsupervised image-to-image translation

Page 56: Sourceslazebni.cs.illinois.edu/fall18/lec14_cgan.pdf• Another example: text-to-image synthesis S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial

Interesting New Yorker article

https://www.newyorker.com/magazine/2018/11/12/in-the-age-of-ai-is-seeing-still-believing