Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised...

Evolution of features in Computer Vision

Towards deep visual features for Generative Adversarial Networks

Computer vision

• Aim of CV imitate human sight, functionality of eye and brain

Features extraction

• Example of problem Human detection

• Histogram of gradients (HOG) Features Classification

Convolutional neural networks

• From hand-crafted features to CNN features

• Images are consecutively down-sampled

• Filters are consecutively increased

Typical CNNs used in Computer Vision

• AlexNet First CNN used in CV

• VGGNet Simple 3x3 filters

Typical CNNs used in Computer Vision

• GoogLeNet First inception network

Visualization of filters

• Use gradient ascent to maximise the activation of filters

Deepdream

• Random image Optimize for a class

Deepdream

• Iterate over its own output

GAN: Generative Adversarial NetworksDCGAN: Unsupervised Representation Learning with Deep

Convolutional Generative Adversarial Networks

ConGAN: Conditional Generative Adversarial NetworksGAN-T2I: Generative Adversarial Text to Image Synthesis

InfoGAN: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

AAE: Adversarial Autoencoders

VAE/GAN: Autoencoding beyond pixels using a learned similarity metric

Unsupervised Representation Learning with Deep Convolutional Generative

Adversarial Networks

Unsupervised learning that works well to generate and discriminate in several image datasets

GANs are focused on the optimization of competing criteria proposed in:Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative

Adversarial Nets. Advances in Neural Information Processing Systems, 2672–2680.

Main contribution:"Extensive model exploration” to identify "a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models":

→ sampling GANs using CNN architectures is not trivial (scaling problem)

1Radford, L. Metz, and S. Chintala. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. Proceedings of the IEEE International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1511.06434v2original slides from http://kawahara.ca/deep-convolutional-generative-adversarial-nets-slides/

Generated - bedrooms

Generated - Faces

Generated - Albumn Covers

Generated - ImageNet

Generator G(·) input=random numbers,

output=generated image

Discriminator D(·) input=generated/real image,

output=prediction of real image

Real image, so goal is D(x)=1

Uniform noise vector (random numbers)

Generated image G(z)

Generated image, so goal is D(G(z))=0 6

SGD for each training iteration SGA for each step within a training iteration

Generator Goal: Fool D(G(z))i.e., generate an image G(z) such that D(G(z)) is wrong.i.e., D(G(z)) = 1

Discriminator Goal: discriminate between real and generated images i.e., D(x)=1, where x is a real image

D(G(z))=0, where G(z) is a generated one

0. Conflicting goals

1.Both goals are unsupervised

2. Optimal when D(·)=0.5 (i.e., cannot tell the difference

between real and generated images) and G(z) learns the training images distribution

Generator Goal: Fool D(G(z))i.e., generate an image G(z) such that D(G(z)) is wrong.i.e., D(G(z)) = 1

Discriminator Goal: discriminate between real and generated images i.e., D(x)=1, where x is a real image

D(G(z))=0, where G(z) is a generated one

Generator

Fractionally-strided convolutions.8x8 input, 5x5 conv window = 16x16 output

Fully-connected layer (composed of weights) reshaped to have width, height and feature maps

Batch Normalization: normalize responses to have zero mean and unit variance over the entire mini-batch, preventing the

generator from collapsing all samples to a single point, which is a common failure mode observed in GANs.

Uses ReLU activation functions

Uses Tanh to scale generated image output between -1 and 1

Fractionally-strided convolutions

Input = 5x5 with zero-padding at border = 6x6

(stride=2)

Output = 3x3

Input = 3x3Interlace zero-padding with inputs = 7x7

(stride=2)

Output = 5x5

Filter size=3x3

Clear dashed squares = zero-padded inputs

Regular convolutions

More info: http://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers

https://www.reddit.com/r/MachineLearning/comments/4bbod7/which_deep_learning_frameworks_support_fractional/

https://github.com/vdumoulin/conv_arithmetic11

Discriminator

Real image

Generated

Goal: discriminate between real and generated images

i.e., D(x)=1, if x is a real imageD(G(z))=0, if G(z) is a generated image

Uses LeakyReLU activation functions

Batch Normalization

No max poolingReduces spatial dimensionality

through strided convolutions

Sigmoid (between 0-1)

Image from: Yeh, R., Chen, C., Lim, T. Y., Hasegawa-Johnson, M., & Do, M. N. (2016). Semantic Image Inpainting with Perceptual and Contextual Losses. 12

Discriminator

Real image

Generated

Goal: discriminate between real and generated images

i.e., D(x)=1, if x is a real imageD(G(z))=0, if G(z) is a generated image

Uses LeakyReLU activation functions

Batch Normalization

No max poolingReduces spatial dimensionality

through strided convolutions

Sigmoid (between 0-1)

Image from: Yeh, R., Chen, C., Lim, T. Y., Hasegawa-Johnson, M., & Do, M. N. (2016). Semantic Image Inpainting with Perceptual and Contextual Losses. 13

Strides

Input = 4x4 Filter = 3x3

stride=1output=2x2

Input = 5x5 Filter = 3x3

stride=2output=2x2

Larger strides reduces dimensionality faster

https://github.com/vdumoulin/conv_arithmetic

Interpolating between a series of random points in Z (Z=the 100-d random numbers)

Note the smooth transition between each scene

←the window slowly appearing

This indicates that the space learned has smooth transitions(and is not simply memorizing)

(Top) Unmodified sample generated images

(Bottom) Samples generated after dropping out "window" concept. Some windows are removed or transformedThe overall scene stays the same, indicating the generator has separated objects (windows) from the scene

Find 3 exemplar images (e.g., 3 smiling women, 3 neutral women and 3 neutral men)

Average their Z vector

small uniform noise to the new

vector

All similar visual concepts

Generate an image based on this new vector Do simple vector arithmetic operations

Arithmetic in pixel space

Average 4 vectors from exemplar faces looking left and 4 looking right

Interpolate between the left and right vectors creates a "turn vector" 18

http://karpathy.github.io/2011/04/27/manually-classifying-cifar10/

CIFAR-10

1) Train on ImageNet2) Get all the responses from the Discriminator's layers3) Max-pool each layer to get a 4x4 spatial grid4) Flatten to form a 1-D dimensional feature vector5) Train a regularized linear L2-SVM classifier for CIFAR-10

(note: while other approaches achieve higher performance, this network was not trained on CIFAR-10!)

State of the art results

Results are not just from the modified architecture. Come from the GAN

StreetView House Numbers

Similar training procedure as before

Summary

✔ Visualizations indicate that the Generator is learning something close to the true distribution of real images.

✔ Classification performance using the Discriminator features indicates that features learned are discriminative of the underlying classes.

➔ Unstability: as models are trained longer they sometimes collapse a subset of filters to a single oscillating mode.

➔ Further research on GANs: investigate properties of the learnt latent space.

Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised...

Documents

GAGAN: Geometry-Aware Generative Adversarial Networksjeankossaifi.com/pdfs/gagan.pdf · popular GAN architectures, namely DCGAN [32] (row 2) and WGAN [2] (row 3). Images generated

Adversarial Feature Augmentation for Unsupervised Domain ... · Adversarial Feature Augmentation for Unsupervised Domain Adaptation Riccardo Volpi1, Pietro Morerio1, Silvio Savarese2,

NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumith Chintala

Generative Adversarial Phonology: Modeling …Generative Adversarial Phonology: Modeling unsupervised allophonic learning with neural networks Ga sper Begu s University of Washington

Generative Adversarial Network CS – 3750 Machine Learningmilos/courses/cs3750/lectures/class23.pdf · Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation

Unsupervised Representation Learning with Prior-Free and Adversarial …min.sjtu.edu.cn/files/papers/2018/Conference/2018-IICME... · 2019-08-22 · adversarial learning, autoencoders

Multi-Player Generative Adversarial NetworksMulti-Player Generative Adversarial Networks Samarth Gupta Massachusetts Institute of Technology samarthg@mit.edu Abstract—In unsupervised

Lecture 9: Unsupervised, Generative & Adversarial Networks · Lecture 9: Unsupervised, Generative & Adversarial Networks ... GENERATIVE & ADVERSARIAL NETWORKS - 20 o Autoencoders

Adversarial Style Mining for One-Shot Unsupervised Domain … · Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation Yawei Luo 1;2 3, Ping Liu4 5, Tao Guan , Junqing

Unsupervised Wireless Spectrum Anomaly Detection With ... · tures (SAIFE), an adversarial autoencoder (AAE)-based anomaly detector for wireless spectrum anomaly detection using power

Unsupervised Image Translation using Adversarial Networks

Unsupervised Image Reconstruction using Deep Generative ...web.stanford.edu/~ekcole/sedona.pdfUnsupervised Image Reconstruction using Deep Generative Adversarial Networks Elizabeth

Learning Semantic Representations for Unsupervised Domain ...proceedings.mlr.press/v80/xie18c/xie18c.pdfadaptation methods are based on generative adversarial net-works (GAN) (Goodfellow

autoencoders - ucbrise.github.io · adversarial models using the encodings. ... -Autoencoders work in the unsupervised setting, and since data is many orders of magnitude cheaper

Unsupervised Learning – Generative Models (PixelRNNs, PixelCNNs) – Generative ... · 2019. 3. 28. · Generative Adversarial Networks Ian Goodfellow et al., “Generative Adversarial

GCAN: Graph Convolutional Adversarial Network for Unsupervised …openaccess.thecvf.com/content_CVPR_2019/supplemental/Ma... · 2019-06-12 · In the domain alignment, a domain alignment

Generative Adversarial Network CS – 3750 Machine Learningpeople.cs.pitt.edu/~milos/courses/cs3750/lectures/class23.pdf · Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised

Learning from Simulated and Unsupervised Images · PDF fileLearning from Simulated and Unsupervised Images through Adversarial Training ... mizes the combination of a local adversarial

Unsupervised Domain Adversarial Self-Calibration …1 Unsupervised Domain Adversarial Self-Calibration for Electromyographic-based Gesture Recognition Ulysse Cotˆ e-Allard, Gabriel

Unsupervised Pixel-level Road Defect Detection via ... · Unsupervised Pixel-level Road Defect Detection via Adversarial Image-to-Frequency Transform Jongmin Yu1 ;2, Duyong Kim3,