Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Generative Adversarial NetworksAdvance Machine Learning
Sourish Das and Madhavan Mukund
Chennai Mathematical Institute
Aug-Nov, 2020
Introduction
1. ”Generative Adversarial Networks” (Nov 2020) by Goodfellowetal . Communications of the ACM, Vol 63, No. 11,DOI:10.1145/3422622 (original version: NIPS2014)
2. ”Hands-on Machine Learning with Scikit-Learn, Keras &TensorFlow”, by Aurélien Géron - Page 592–606
The Idea
I Problem Statement:1. Suppose x is the data generated from p(x)
2. End goal is to learn about unknown p( )
I Main Idea of GAN:
1. Two neural networks compete against each other
2. The competition will push them to excel and learn the datagenerating process p(x)
Structure of GAN
I Generator Network1. A network that generate data (for example decoder part of
VAE) - often known as fake data
2. Goal: Trick the Discriminator
I Discriminator Network1. Takes either a fake data from the generator or a real data from
the training set as input
2. guess whether the input data is fake or real.
3. Goal: Tell real or fake data
Generative Adversarial Network
source: Hands-on Machine Learning with Scikit-Learn, Keras &TensorFlow, by Aurélien Géron, page - 592
Example of GAN
Source: Goodfellow etal . 2020
Example of GAN
Source: Goodfellow etal . 2020
Unique Feature of GAN
I It is based on Game Theory,while other generative models are based on optimization
I GAN avoid the entire issue of designing a tractable densityfunction and learn only a tractable sample generation process.
I These are called implicit generative models.
Let’s look inside GAN
I Suppose G is the generator model such that
x = G (z),
where
1. G is differentiable function
2. z is the latent variable sampled from some simple priordistribution
3. x is drawn from pmodel
I This idea is common in Probability and Statistics
I If we want to simulate from pmodel = Exponential(λ),simulate z ∼ Unif (0, 1) then pass it to G (z) = − lnzλ = x
GAN is about solving Game
I The generator function G (z,θG ) = x uses θG as parameters
I D(x,θD) is the discriminator network model where x is inputand θD is the parameters of the D
I Both players have cost functions that are defined in terms ofboth players’ parameters.
I Discriminator wishes to minimize J(D)(θ(D),θ(G)) such that
minθ(D)
J(D)(θ(D),θ(G))
I Generator wishes to minimize J(G)(θ(D),θ(G)) such that
minθ(G)
J(D)(θ(D),θ(G))
Solution of GAN is Nash equilibrium
I The solution to a game is a Nash equilibrium, more preciselyof local differential Nash equilibria.
I In this context, a Nash equilibrium is a tuple (θ(D),θ(G)) thatis a local minimum of JD wrt θ(D) and local minimum of JG
wrt θ(G)
Discriminator’s cost
I The cost used for the discriminator is:
JD(θD ,θG ) = −12Ex∼pdata log D(x)−
1
2Ez log(1− D(G (z)))
I standard cross-entropy cost that is minimized when training astandard binary classifier with a sigmoid output
I the classifier is trained on two minibatches of data:
1. one coming from the dataset, where the label is 1 for allexamples
2. other coming from the generator, where the label is 0 for allexamples.
I All versions of the GAN game encourage thediscriminator to minimize the above cost function.
Generator’s cost and Minimax
I The simplest version of the game is a zero-sum game.
I In this version of the game
JG = −JD
I Because JG is tied directly to JD , value function specifyingthe discriminator’s payoff:
V (θD ,θG ) = −JD(θD ,θG )
I Zero-sum games are also called minimax:
θG∗ = arg minθG
maxθD
V (θD ,θG )
More on Value FunctionsLet’s look into the value function:
V (θD ,θG ) = Ex∼p̂ log D(x) + Ez∼pz (z) log(1− D(G (z)))
=
∫x
p̂(x) log(D(x))dx +
∫z
pz(z) log(1− D(G (z)))dz
Now
x = G (z) =⇒ z = G−1(x) =⇒ dz = (G−1)′(x)dx
=
∫x
p̂(x) log(D(x))dx +
∫x
pz(G−1(x)) log(1− D(x))(G−1)′(x)dx
=
∫x
p̂(x) log(D(x))dx +
∫x
pg (x)(1− D(x))dx,
where pg (x) = pz(G−1(x))(G−1)′(x)
=
∫x{p̂(x) log(D(x)) + pg (x)(1− D(x))}dx
Understanding the Value Functions
maxD
V (D,G ) = maxD
∫x{p̂(x) log(D(x)) + pg (x)(1− D(x))}dx
∂
∂D(x){p̂(x) log(D(x)) + pg (x)(1− D(x))} = 0
=⇒ p̂(x)D(x)
− pg (x)1− D(x)
= 0
=⇒ D∗G (x) =p̂(x)
p̂(x) + pg (x)
So this the optimal D for given G
Understanding the Cost function of G
C (G ) = maxD
V (D,G )
= maxD
∫x
p̂(x) log(D(x)) + pg (x) log(1− D(x))dx
=
∫x
p̂(x) log(D∗G (x)) + pg (x) log(1− D∗G (x))dx
=
∫x
p̂(x) log(p̂(x)
p̂(x) + pg (x)) + pg (x) log(
pg (x)
p̂(x) + pg (x))dx
=
∫x
p̂(x) log(p̂(x)
p̂(x)+pg (x)2
) + pg (x) log(pg (x)
p̂(x)+pg (x)2
)dx− log(4)
= KL(p̂(x)|| p̂(x) + pg (x)2
) + KL(pg (x)||p̂(x) + pg (x)
2)
− log(4)
Understanding of Cost Function G
C (G ) = KL(p̂(x)|| p̂(x) + pg (x)2
) + KL(pg (x)||p̂(x) + pg (x)
2)− log(4)
C (G ) will be minimum when
KL(p̂(x)|| p̂(x)+pg (x)2 ) = 0 and KL(pg (x)||p̂(x)+pg (x)
2 ) = 0, i.e.,
=⇒ p̂(x) = p̂(x) + pg (x)2
=⇒ pg (x) =p̂(x) + pg (x)
2=⇒ p̂(x) = pg (x)
Example of GAN
Source: ”Generative Adversarial Networks” (Nov 2020) byGoodfellow etal . Communications of the ACM, Vol 63, No. 11,DOI:10.1145/3422622 (original version: NIPS2014)
Convergence of GANS
I In many different models (other than Goodfelow etal . 2014and some other models), whether a Nash equilibrium exists
I whether any spurious Nash equilibria exist !!
I whether the learning algorithm converges to a Nashequilibrium? If so then how quickly?
I ”In many cases of practical interest, these theoretical questionsare open, and the best learning algorithms seem empirically tooften fail to converge.”
Summary
I GANs are generative model based on game theory.
I They have had great practical success in terms of generatingrealistic data, especially images.
I It is currently still difficult to train them.
I For GANs to become a more reliable technology, it will benecessary to design models, costs, or training algorithms forwhich it is possible to find good Nash equilibria consistentlyand quickly
Thank You
[email protected]