Generating Images Part by Part with Composite Generative ...alanwags/DLAI2016/(Kwak+) IJCAI-16 DLAI...

Preview:

Citation preview

Generating Images Part by Part with Composite Generative Adversarial Networks

Hanock Kwak and Byoung-Tak ZhangDepartment of Computer Science and Engineering, Seoul National University, {hnkwak, btzhang}@bi.snu.ac.kr

Methods

BiointelligenceLab, SeoulNationalUniversity | Seoul 151-744, Korea (http://bi.snu.ac.kr)

Experimental ResultsBackgrounds• Images are composed of several different objects forming a hierarchical

structure with various styles and shapes.

• Deep learning models are used to implicitly disentangle complexunderlying patterns of data, forming distributed feature representations.

• Generative adversarial networks (GAN) are successful unsupervisedlearning models that can generate samples of natural imagesgeneralized from the training data.

• It is proven that if the GAN has enough capacity, data distributionformed by GAN can converge to the distribution over real data

• Composite generative adversarial network (CGAN) can generate imagespart by part.

• CGAN uses an alpha channel for opacity along with RGB channels tostack images iteratively with alpha blending process.

• The alpha blending process maintains previous image in some areas andoverlap the new image perfectly in other areas.

Key Ideas

• The structure of CGAN. The images are then combined sequentially by

alpha blending process to form the final output 𝑂 𝑛 .

Examples of generated images from CGAN with three

generators.

• The alpha blending combines two translucent images, producing a newblended image.

• The objective of generator (G) is to fit the true data distributiondeceiving discriminator (D) by playing following minimax game

• Samples drawn from CGAN after trained on CelebA, Pororo, Oxford-102Flowers, and MS COCO datasets respectively.

𝐶1

𝐶2

𝐶3

𝑂 3

𝐶1

𝐶2

𝑂 2

𝐶1

𝐶2

𝑂 2

𝐶1

𝐶2

𝑂 2

Conclusion & Discussion

• We found implicit possibilities of structure learning from images withoutany labels by constructing the hierarchical structures of the images.

• Our model could be extended to other domains such as video, text, andaudio, or combination of them.

• Since most of the data has hierarchical structures, studies ondecomposing the combined data are essential to finding correlationbetween multimodal data.

• In addition to the empirical results, theoretical analysis and quantitativeevaluation are needed to verify the works and other generation tasks.

Recommended