28
Distributional Smoothing with Virtual Adversarial Training Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii (Kyoto Univ.) Presenter:Takeru Miyato

Distributional Smoothing with Presenter:Takeru Miyato

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributional Smoothing with Presenter:Takeru Miyato

Distributional Smoothing with Virtual Adversarial Training

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii (Kyoto Univ.)

Presenter:Takeru Miyato

Page 2: Distributional Smoothing with Presenter:Takeru Miyato

Classification

Modelp(y|x,θ)

Input : x

“Cat”Output : y

Page 3: Distributional Smoothing with Presenter:Takeru Miyato

Purpose

● Get a good classifier ○ A good classifier : a classifier which can predict a label correctly on a

example not included in training sets.(汎化性能が高い)

● How to get a good classifier?○ Data augmentation ○ Bayes inference○ Regularization (正則化)<- I am working on this and propose a new

method.

Page 4: Distributional Smoothing with Presenter:Takeru Miyato

What are the properties of good classifier?

● This shall NOT happen!

[Goodfellow, et al 2015]

Page 5: Distributional Smoothing with Presenter:Takeru Miyato

What are the properties of good classifier?

Page 6: Distributional Smoothing with Presenter:Takeru Miyato

Adversarial training [Goodfellow, 2015]

Adversarial Perturbation :

Page 7: Distributional Smoothing with Presenter:Takeru Miyato

Adversarial examples from overfitting

Page 8: Distributional Smoothing with Presenter:Takeru Miyato

Adversarial examples from underfitting

Page 9: Distributional Smoothing with Presenter:Takeru Miyato

[proposal] Adversarial examples for unlabeled inputs ● Define the difference between p(y|x,θ) and p(y|x+r,θ) as:

● Define *virtual* adversarial perturbation as :

(ε : norm constraint)

Page 10: Distributional Smoothing with Presenter:Takeru Miyato

Virtual Adversarial Training (VAT)

● Local Distributional Smoothness

● Maximize:

(λ: balance factor)

Page 11: Distributional Smoothing with Presenter:Takeru Miyato

Computation of LDS

● We will need to compute

● For can be compute rv-adv fast? ○ Our method allows for fast approximation!

Page 12: Distributional Smoothing with Presenter:Takeru Miyato

Approximation of rv-adv

● Approximate ΔKLwith 2nd Taylor expansion:

Page 13: Distributional Smoothing with Presenter:Takeru Miyato

Approximation of rv-adv

● In 2nd Taylor approximation, we see that rv-adv is the first dominant eigenvector of H(x,θ) of magnitude ε :

u(x,θ) is 1st eigen vector of H(x,θ)

Page 14: Distributional Smoothing with Presenter:Takeru Miyato

Approximation of rv-adv

● Power iteration method :

● Finite difference method :

Page 15: Distributional Smoothing with Presenter:Takeru Miyato

Algorithm for the generation of approximated rv-adv

Page 16: Distributional Smoothing with Presenter:Takeru Miyato

Demo : 2D synthetic dataset

● Use 1 hidden layer NN.● Use gradient descent

with Momentum.

Page 17: Distributional Smoothing with Presenter:Takeru Miyato

Learning curves

Page 18: Distributional Smoothing with Presenter:Takeru Miyato

Visualization of p(y|x,θ), MLE and L2

L2 regularizationMethod

average LDS

no regularization

Page 19: Distributional Smoothing with Presenter:Takeru Miyato

Visualization of p(y|x,θ), MLE and VAT(proposed)

Method

average LDS

no regularization VAT ( proposed )

Page 20: Distributional Smoothing with Presenter:Takeru Miyato

Visualization of p(y|x,θ)

Method

average LDS

no regularization L2 regularization VAT ( proposed )

Page 21: Distributional Smoothing with Presenter:Takeru Miyato

Benchmark results : Supervised learning for MNIST

● MNIST (permutation invariant task)○ 60000 training samples○ Use ADAM[Kingma, 2015] optimizer

Page 22: Distributional Smoothing with Presenter:Takeru Miyato

Supervised learining for MNIST

● MNIST (permuation invariant task)

Page 23: Distributional Smoothing with Presenter:Takeru Miyato

Semi-supervised learning (permutation invariant task)

● MNIST

● CIFAR10○ 4000 labeled and 50000 unlabeled

● SVHN (street view housing numbers)○ 1000 labeled and 70000 unlabeled

Page 24: Distributional Smoothing with Presenter:Takeru Miyato

Semi-supervised learning (permutation invariant task) ● MNIST (1000 labeled samples)

TSVM: Transductive SVMMTC: Manifold Tangent ClassifierDG: Deep Generative Model

Page 25: Distributional Smoothing with Presenter:Takeru Miyato

Semi-supervised learning on CIFAR10

● CIFAR10 (4000 labeled)

Page 26: Distributional Smoothing with Presenter:Takeru Miyato

Semi-supervised learning on SVHN

Page 27: Distributional Smoothing with Presenter:Takeru Miyato

Conclusion

● Our approach was effective for supervised and semi-supervised learning for benchmark datasets.

● With optimizing only 1 hyperparameter ε , our method achieved good performance.

Page 28: Distributional Smoothing with Presenter:Takeru Miyato

References

● T. Miyato, S. Maeda, M. Koyama, K. Nakae and S. Ishii. Distributional Smoothing with Virtual Adversarial Training. ICLR 2016.○ arXiv:http://arxiv.org/abs/1507.00677○ github:https://github.com/takerum/vat

● I. Goodfellow, J. Shlens and C. Szegedy. Explaining and Harnessing Adversarial Examples. ICLR 2015.