Invertible Conditional GANs for image editing€¦ · Complex image editing often requires human...

Preview:

Citation preview

2. Conditional GANs (cGANs)

Invertible Conditional GANs for image editing

Guim Perarnau, Joost van de Weijer*, Bogdan Raducanu*, Jose M. Álvarez†

* Computer Vision Center, Barcelona, Spain † Data61 @ CSIRO, Canberra, Australia

guimperarnau@gmail.com, {joost,bogdan}@cvc.uab.es, jose.alvarez@data61.csiro.au

Bibliography[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680.

[2] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, International Conference on Learning Representations, 2016.

[3] S. E. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis”, International Conference on Machine Learning, 2016.

Overview

Problem

Complex image editing often requires human

supervision and professional image editing

tools. How can we automatize these complex

operations?

Solution

We propose Invertible Conditional GANs(IcGANs), a model that combines a conditional

GAN with an encoder.

How?

1. Generate realistic images via GANs.

2. Condition generated images with attributes.

3. Encode real images in order to reconstruct

them with the desired changes.

3. Encoder

4. Invertible conditional GANs (IcGANs) 5. Results

6. Conclusions

Now we can combine both cGAN and encoder to create an IcGAN. With the encoder, we can invert the

generator and map an image into a high feature space z and y. In this space, we can arbitrarily change key

aspects of the image and then reconstruct the modified image using the generator.

DiscriminatorGenerator

Fake images

Real images

Fake?

Real?

Backpropagation

GANs are composed of two networks, a generator and a discriminator. The generator is trained to

fool the discriminator by creating realistic images, and the discriminator is trained not to be fooled

by the generator.

One way to evaluate these models is to directly see how visually appealing the generated

samples are. Here we show some qualitative examples of what an IcGAN is capable of by

playing with both latent space z and conditional information y.

We fix z for every row and modify y for each column to obtain variations of real images.

Interpolations between faces.

Swapping two face attributes.

AcknowledgmentsThis work is funded by the Projects TIN2013-41751-P of the Spanish Ministry

of Science and the CHIST ERA project PCIN-2015-226.

Scheme of an IcGAN and how it is used.

Example of complex editing operations.

Blonde Smile MaleBangsOriginal

Code available! → https://github.com/Guim3/IcGAN

1. Generative Adversarial Networks (GANs)

With cGANs, we add into the model conditional information y that describes some aspect of the

data. This allows to control certain aspects of the generated images, e.g. generate a blonde

woman with sunglasses. We refine cGANs by testing the optimal position in which y is inserted in

the generator and discriminator.

Full conv 1 Full conv 2Full conv 3

Full conv 4

Full conv 5

100

Conv 1 Conv 2 Conv 3 Conv 4Conv 5

Encoder Generator

1

0

1

1

0

female

black hair

brown hair

make-up

sunglasses

Change vector

1

1

0

1

0

female

black hair

brown hair

make-up

sunglasses cGANIcGAN

Encoder

Input Recons. Swap y

Input Recons. Swap y• We introduce IcGANs, which solves the problem of GANs lacking the ability to reconstruct real images

while also allowing to explicitly control complex attributes of generated samples.

• We refine the performance of cGANS by inserting the conditional information 𝑦 at the input level for the

generator and at the first layer for the discriminator.

• We evaluate several approaches to training an encoder, being two independent encoders 𝐸𝑧 and 𝐸𝑦 (IND)

the best option.

0

10

20

30

40

50

60

70

Input Layer 1 Layer 2 Layer 3 Layer 4

F1-S

core

cGAN evaluation depending on y inserted position

Discriminator

Generator

0,35

0,4

0,45

0,5

Reco

nst

ruct

ion

lo

ss

Encoder type comparison

SNG IND IND-COND

Then, we train an encoder to reconstruct real images. It is trained after the cGAN and is composed

of two sub-encoders: 𝐸𝑧, which encodes an image to 𝑧, and 𝐸𝑦, which encodes an image to 𝑦′..

We test different strategies to make them interact and improve the encoding process:

• SNG: 𝐸𝑧 and 𝐸𝑦 are embedded in a single encoder.

• IND: 𝐸𝑧 and 𝐸𝑦 are trained separately.

• IND-COND: two independent encoders, where 𝐸𝑧is conditioned on the output of 𝐸𝑦.

Higher is better Lower is better