9
3D Volumetric Data Generation with Generative Adversarial Networks Hiroyuki Vincent Yamazaki Keio University [email protected] Preferred Networks Summer Internship, 2016

3D Volumetric Data Generation with Generative Adversarial Networks

Embed Size (px)

Citation preview

Page 1: 3D Volumetric Data Generation with Generative Adversarial Networks

3D Volumetric Data Generation with

Generative Adversarial Networks

Hiroyuki Vincent Yamazaki Keio University [email protected]

Preferred Networks Summer Internship, 2016

Page 2: 3D Volumetric Data Generation with Generative Adversarial Networks

BackgroundGenerative Adversarial Networks (GAN) [1] have achieved state-of-the-art performance in unsupervised learning, generating synthetic images by training on the MNIST dataset or ImageNet for multi-channel images.However, these networks have not yet been extended to higher dimensions such as volumetric 3D data. Generated 3D model have various applications in entertainment and could be used as an alternative to existing procedural methods for creating graphics.This study demonstrates the capabilities of GAN-based architectures for generating practical 3D models by applying 3 dimensional convolutions and deconvolutions* on voxel data.

Goal• Extension of GANs to 3D volumetric data, training on a single class• Control the shapes of the generated models by e.g. interpolation

1. Introduction

*Transposed Convolutions

Page 3: 3D Volumetric Data Generation with Generative Adversarial Networks

2. Training Data3D CAD models from ShapeNet [2]• Class: Chair• Instances: 4846

Preprocessing • Voxelization

• 3D CAD models are converted into binary 0, 1 voxels with dimensions (32, 32, 32). [3]• Normalization

• No normalization is applied. Data is in range [0, 1]• Other

• Remove bad samples and centre the models in the space

Training Data Volume DistributionMean 3D Model

Page 4: 3D Volumetric Data Generation with Generative Adversarial Networks

A GAN consists of a generator G and a discriminator D, in this case, both of them are represented as a feed forward neural network that are trained simultaneously.• Random noise z vectors sampled from a uniform or

Gaussian distribution

Loss • Softmax cross-entropies based on the predictions of D• Separate losses for G and D defined by the minimax game

Optimal Discriminator Strategy

Optimization• Adam for both G and D • Learning rate of G is larger than D

3. Generative Adversarial Network

Random Noise

Random Index

Generator(Linear, Deconvolution, Batch Normalization,

ReLU, Sigmoid)

Discriminator(Convolution, Linear, Leaky ReLU)

Training Data

Generated3D Model

Real3D Model

Generated/RealPrediction

See Appendix for the network architecture and Adam parameters

min

G

max

D

V (G,D) = Ex⇠Pdata(x)[logD(x)] + E

z⇠Pz(z)[log(1�D(G(z)))]

D(x) =pdata(x)

pdata(x) + pG(x)

Page 5: 3D Volumetric Data Generation with Generative Adversarial Networks

Issues with GAN• Collapsing Generator

• G outputs similar 3D models for different inputs• Non-semantic input z

• Interpolation of z indicate on sharp edges in the latent space. Hence no way to control the shape of the output

Improving the GAN• Avoid Generator from collapsing

• Minibatch Discrimination [4] layer in D• Embed semantic meaning into the input [5]

• With z, concatenate additional latent codes before feeding it to G

• Additional loss based on mutual information reconstruction by D

Random Noise + Latent Codes

Random Index

Generator(Linear, Deconvolution, Batch Normalization,

ReLU, Sigmoid)

Discriminator(Convolution, Linear, Leaky ReLU, Minibatch Discrimination)

Training Data

Generated3D Model

Real3D Model

Generated/RealPrediction

Mutual InformationReconstruction

Page 6: 3D Volumetric Data Generation with Generative Adversarial Networks

Minibatch DiscriminationMotivationAvoid generator from collapsing to a single pointIdeaReproduce the diversity in the training dataMinibatch Discrimination layer to D, before the generated/real prediction

For each minibatch fed to this layer, compute the L1 distance between all input vectorsAdd this information to the given minibatch

Mutual Information ReconstructionMotivationEmbed semantic meanings in zIdeaMaximize the mutual information being preserved for latent codes C that are passed through the networksLatent Codes, input to G• C = [C1, C2, C3] (Concatenations)

• Categorical one-hot vector C1~Cat(K=2, p=0.5)• Continuous C2~Unif(-1, 1)• Continuous C3~Unif(-1, 1)

Reconstruction, output from D• Categorical

• Softmax Cross Entropy

• Continuous• Assume a fixed variance and compute the Gaussian

negative log-likelihood based on the mean.

z c1, e.g. [0, 1] c2 c3

Softmax1

𝞵2 𝞵3

Minibatch Discrimination Layer

Kernel … …

Page 7: 3D Volumetric Data Generation with Generative Adversarial Networks

• Minibatch size: 128• Epochs: 100

4. Results

Generated 3D Models

*The blue models are their nearest models in the training dataset

3D VolumeDistributions

Chair-likeness Learned Distribution

True DistributionLosses

Page 8: 3D Volumetric Data Generation with Generative Adversarial Networks

5. Conclusions• GANs can be extended to 3D volumetric data using 3 dimensional convolutions and deconvolutions• Smaller datasets (sparse data) leads to worse looking models with noise

• Partially mitigated by reconstructing mutual information reconstruction and minibatch discrimination• In many cases, D improves faster than G

• Gradients back propagated through G saturates and training stops• Training not converging

Future Work• Larger dataset with potentially multiple classes• Balance training between G and D

• Heuristic• Stop updating D while it is too strong

• Larger G, i.e. more parameters

Page 9: 3D Volumetric Data Generation with Generative Adversarial Networks

Reference[1] Goodfellow et al. (2014). Generative Adversarial Networks. abs/1406.2661, .[2] Angel X. Chang and (2015). ShapeNet: An Information-Rich 3D Model Repository. CoRR, abs/1512.03012, .[3] Patrick Min, Binvox, 3D Mesh Voxelizer, http://www.patrickmin.com/binvox/[4] Tim Salimans et al. (2016). Improved Techniques for Training GANs. CoRR, abs/1606.03498, .[5] Xi Chen et al. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing. CoRR, abs/1606.03657, .

Appendix

Generator DiscriminatorInput ∈ R128+2+2 Input 32x32x32 3D voxel data

FC 1024, BN, ReLU Conv 1 → 64, Kernel 4, Stride 2, lReLU (leaky ReLU)FC 16384, BN, ReLU Conv 64 → 128, Kernel 4, Stride 2, BN, lReLU

DC 256 → 128, Kernel 4, Stride 2, BN, ReLU Conv 128 → 256, Kernel 4, Stride 2, BN, lReLUDC 128 → 64, Kernel 4, Stride 2, BN, ReLU FC 1024, BN, lReLU

Output DC 64 → 1, Kernel 4, Stride 2, BN, ReLU Minibatch Discrimination, Kernels 64, Kernel Dimension 16Output FC 2 (Generated/Real prediction)

FC 256, BN, lReLUOutput FC 2+2 (Mutual Information Reconstruction)

Adam Optimizer ParametersGenerator Discriminator

ɑ 0.001 0.00005β1 0.5 0.5β1 0.999 0.999

GAN Architecture