21
Generative Adversarial Networks Advance Machine Learning Sourish Das and Madhavan Mukund Chennai Mathematical Institute Aug-Nov, 2020

Generative Adversarial Networksmadhavan/courses/aml2020/lectures/... · 2020. 11. 19. · 1."Generative Adversarial Networks"(Nov 2020) by Goodfellow etal: Communications of the ACM,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Generative Adversarial NetworksAdvance Machine Learning

    Sourish Das and Madhavan Mukund

    Chennai Mathematical Institute

    Aug-Nov, 2020

  • Introduction

    1. ”Generative Adversarial Networks” (Nov 2020) by Goodfellowetal . Communications of the ACM, Vol 63, No. 11,DOI:10.1145/3422622 (original version: NIPS2014)

    2. ”Hands-on Machine Learning with Scikit-Learn, Keras &TensorFlow”, by Aurélien Géron - Page 592–606

  • The Idea

    I Problem Statement:1. Suppose x is the data generated from p(x)

    2. End goal is to learn about unknown p( )

    I Main Idea of GAN:

    1. Two neural networks compete against each other

    2. The competition will push them to excel and learn the datagenerating process p(x)

  • Structure of GAN

    I Generator Network1. A network that generate data (for example decoder part of

    VAE) - often known as fake data

    2. Goal: Trick the Discriminator

    I Discriminator Network1. Takes either a fake data from the generator or a real data from

    the training set as input

    2. guess whether the input data is fake or real.

    3. Goal: Tell real or fake data

  • Generative Adversarial Network

    source: Hands-on Machine Learning with Scikit-Learn, Keras &TensorFlow, by Aurélien Géron, page - 592

  • Example of GAN

    Source: Goodfellow etal . 2020

  • Example of GAN

    Source: Goodfellow etal . 2020

  • Unique Feature of GAN

    I It is based on Game Theory,while other generative models are based on optimization

    I GAN avoid the entire issue of designing a tractable densityfunction and learn only a tractable sample generation process.

    I These are called implicit generative models.

  • Let’s look inside GAN

    I Suppose G is the generator model such that

    x = G (z),

    where

    1. G is differentiable function

    2. z is the latent variable sampled from some simple priordistribution

    3. x is drawn from pmodel

    I This idea is common in Probability and Statistics

    I If we want to simulate from pmodel = Exponential(λ),simulate z ∼ Unif (0, 1) then pass it to G (z) = − lnzλ = x

  • GAN is about solving Game

    I The generator function G (z,θG ) = x uses θG as parameters

    I D(x,θD) is the discriminator network model where x is inputand θD is the parameters of the D

    I Both players have cost functions that are defined in terms ofboth players’ parameters.

    I Discriminator wishes to minimize J(D)(θ(D),θ(G)) such that

    minθ(D)

    J(D)(θ(D),θ(G))

    I Generator wishes to minimize J(G)(θ(D),θ(G)) such that

    minθ(G)

    J(D)(θ(D),θ(G))

  • Solution of GAN is Nash equilibrium

    I The solution to a game is a Nash equilibrium, more preciselyof local differential Nash equilibria.

    I In this context, a Nash equilibrium is a tuple (θ(D),θ(G)) thatis a local minimum of JD wrt θ(D) and local minimum of JG

    wrt θ(G)

  • Discriminator’s cost

    I The cost used for the discriminator is:

    JD(θD ,θG ) = −12Ex∼pdata log D(x)−

    1

    2Ez log(1− D(G (z)))

    I standard cross-entropy cost that is minimized when training astandard binary classifier with a sigmoid output

    I the classifier is trained on two minibatches of data:

    1. one coming from the dataset, where the label is 1 for allexamples

    2. other coming from the generator, where the label is 0 for allexamples.

    I All versions of the GAN game encourage thediscriminator to minimize the above cost function.

  • Generator’s cost and Minimax

    I The simplest version of the game is a zero-sum game.

    I In this version of the game

    JG = −JD

    I Because JG is tied directly to JD , value function specifyingthe discriminator’s payoff:

    V (θD ,θG ) = −JD(θD ,θG )

    I Zero-sum games are also called minimax:

    θG∗ = arg minθG

    maxθD

    V (θD ,θG )

  • More on Value FunctionsLet’s look into the value function:

    V (θD ,θG ) = Ex∼p̂ log D(x) + Ez∼pz (z) log(1− D(G (z)))

    =

    ∫x

    p̂(x) log(D(x))dx +

    ∫z

    pz(z) log(1− D(G (z)))dz

    Now

    x = G (z) =⇒ z = G−1(x) =⇒ dz = (G−1)′(x)dx

    =

    ∫x

    p̂(x) log(D(x))dx +

    ∫x

    pz(G−1(x)) log(1− D(x))(G−1)′(x)dx

    =

    ∫x

    p̂(x) log(D(x))dx +

    ∫x

    pg (x)(1− D(x))dx,

    where pg (x) = pz(G−1(x))(G−1)′(x)

    =

    ∫x{p̂(x) log(D(x)) + pg (x)(1− D(x))}dx

  • Understanding the Value Functions

    maxD

    V (D,G ) = maxD

    ∫x{p̂(x) log(D(x)) + pg (x)(1− D(x))}dx

    ∂D(x){p̂(x) log(D(x)) + pg (x)(1− D(x))} = 0

    =⇒ p̂(x)D(x)

    − pg (x)1− D(x)

    = 0

    =⇒ D∗G (x) =p̂(x)

    p̂(x) + pg (x)

    So this the optimal D for given G

  • Understanding the Cost function of G

    C (G ) = maxD

    V (D,G )

    = maxD

    ∫x

    p̂(x) log(D(x)) + pg (x) log(1− D(x))dx

    =

    ∫x

    p̂(x) log(D∗G (x)) + pg (x) log(1− D∗G (x))dx

    =

    ∫x

    p̂(x) log(p̂(x)

    p̂(x) + pg (x)) + pg (x) log(

    pg (x)

    p̂(x) + pg (x))dx

    =

    ∫x

    p̂(x) log(p̂(x)

    p̂(x)+pg (x)2

    ) + pg (x) log(pg (x)

    p̂(x)+pg (x)2

    )dx− log(4)

    = KL(p̂(x)|| p̂(x) + pg (x)2

    ) + KL(pg (x)||p̂(x) + pg (x)

    2)

    − log(4)

  • Understanding of Cost Function G

    C (G ) = KL(p̂(x)|| p̂(x) + pg (x)2

    ) + KL(pg (x)||p̂(x) + pg (x)

    2)− log(4)

    C (G ) will be minimum when

    KL(p̂(x)|| p̂(x)+pg (x)2 ) = 0 and KL(pg (x)||p̂(x)+pg (x)

    2 ) = 0, i.e.,

    =⇒ p̂(x) = p̂(x) + pg (x)2

    =⇒ pg (x) =p̂(x) + pg (x)

    2=⇒ p̂(x) = pg (x)

  • Example of GAN

    Source: ”Generative Adversarial Networks” (Nov 2020) byGoodfellow etal . Communications of the ACM, Vol 63, No. 11,DOI:10.1145/3422622 (original version: NIPS2014)

  • Convergence of GANS

    I In many different models (other than Goodfelow etal . 2014and some other models), whether a Nash equilibrium exists

    I whether any spurious Nash equilibria exist !!

    I whether the learning algorithm converges to a Nashequilibrium? If so then how quickly?

    I ”In many cases of practical interest, these theoretical questionsare open, and the best learning algorithms seem empirically tooften fail to converge.”

  • Summary

    I GANs are generative model based on game theory.

    I They have had great practical success in terms of generatingrealistic data, especially images.

    I It is currently still difficult to train them.

    I For GANs to become a more reliable technology, it will benecessary to design models, costs, or training algorithms forwhich it is possible to find good Nash equilibria consistentlyand quickly

  • Thank You

    [email protected]

    [email protected]