28
Multimodal Unsupervised Image-to-Image Translation Ming-Yu Liu NVIDIA

Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Multimodal Unsupervised Image-to-Image Translation

Ming-Yu Liu

NVIDIA

Page 2: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Supervised/Paired/Aligned/Registered Unsupervised/Unpaired/Unaligned/Unregistered

Supervised vs UnsupervisedPaired vs UnpairedAligned vs UnalignedSupervised vs Unsupervised

Page 3: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Image Domain Transfer

Page 4: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Example Applications

Page 5: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Generator

……

Discriminator real/fake

Real data

Goodfellow et al. 2014

Generative Adversarial Networks (GANs)

Sample

Page 6: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

TranslationNetwork𝑭𝟏→𝟐

……

Discriminator real/fake

Domain 2

……Domain 1

Plain GAN for Unsupervised Image-to-Image Translation

Page 7: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

CycleGAN and UNIT

• CycleGAN (cycle consistency) [Zhu et al. 2017]

• UNIT (shared latent space) [Liu et al. 2017]

shared latent space

cycleconsistency

shared latent space ⟹ cycle consistency

Page 8: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Unimodality

Page 9: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Towards Multimodality

Page 10: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Cycle consistency does not allow multimodality

Domain 𝒳1 Domain 𝒳2

𝑥1

𝑥2′

𝑥2′′

Cycle consistency:𝒳1 → 𝒳2 →𝒳1

Cycle consistency:𝒳2 → 𝒳1 →𝒳2

𝑥2′ = 𝑥2

′′

𝑝(𝑥2|𝑥1) = 𝛿 𝑥2 − 𝑥2′

𝑝(𝑥2|𝑥1) = 𝛿(𝑥2 − 𝑥2′′)

Shared latent space does not allow multimodality

shared latent space

cycleconsistency

Page 11: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Disentangling the Latent Space

•UNIT • A single shared, domain-invariant latent space 𝒵

Page 12: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Disentangling the Latent Space

•Multimodal UNIT (MUNIT)• A content space 𝒞 that is shared, domain-invariant• Two style spaces 𝒮1, 𝒮2 that are unshared, domain-specific

Page 13: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Training

•Notations:• 𝑥: images• 𝑐: content• 𝑠: style

• Loss:• Bidirectional reconstruction loss

• Image reconstruction loss

• Latent reconstruction loss

• GAN lossWithin-domain reconstructionCross-domain translation

Page 14: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Bidirectional Reconstruction Loss:Image Reconstruction

Notations:•𝑥: images• 𝑐: content• 𝑠: style

Page 15: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Bidirectional Reconstruction Loss:Latent Reconstruction

Notations:•𝑥: images• 𝑐: content• 𝑠: style

Page 16: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

GAN Loss

Notations:•𝑥: images• 𝑐: content• 𝑠: style

Page 17: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

remove style

Background: Instance Normalization (IN)

Content feature: 𝑐normalization affine

transformation

“Instance Normalization: The Missing Ingredient for Fast Stylization”, Ulyanov et al. 2017

apply style

Feedforward transfer of a single style Content Output

Page 18: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Adaptive Instance Normalization (AdaIN)

Style feature: 𝑠

Content feature: 𝑐

Feedforward transfer of arbitrary styles

VGG

Content

Style

AdaIN Decoder Output

remove style

apply style

Page 19: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

AdaIN in a Generative Network

En

co

de

rE

nco

de

r

Ad

aIN

ML

PE

nco

de

r

Ad

aIN

De

co

de

r

De

co

de

r

AdaIN in style transfer AdaIN in a generative network

Page 20: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

AdaIN in a Generative Network

En

co

de

rE

nco

de

r

Ad

aIN

De

co

de

r

ML

PE

nco

de

r

Ad

aIN

De

co

de

rAd

aIN

Ad

aIN

Co

nv

Co

nv

AdaIN in style transfer AdaIN in a generative network

Page 21: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Architectural Implementation

Page 22: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Sketches <-> Photo

Input Outputs

Page 23: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Cats ↔ Dogs

Input Outputs

Page 24: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Synthetic ↔ Real

Input Outputs

Page 25: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Summer ↔ Winter

Input Outputs

Page 26: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Example-guided Translation

Page 27: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Example-guided Translation

Page 28: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

Conclusion

• Translate one input image to multiple corresponding images in the target domain.

• Content and style decomposition via the AdaIN design

• ECCV 2018

• MUNIT code: https://github.com/nvlabs/munit/

• Paper: https://arxiv.org/abs/1804.04732

Xun HuangNVIDIA, Cornell

Serge BelongieCornell

Jan KautzNVIDIA

Ming-Yu LiuNVIDIA