Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised...

Virtual Adversarial Training for Semi-Supervised Text Classification

Takeru Miyato, Andrew M. Dai, Ian Goodfellow

Slides By Roee Aharoni BIU NLP summer 2016 reading group

Motivation

Outline• Introduction to (virtual) adversarial training

• Virtual adversarial training for text classification

• Experimental Setup

• Results (and some analysis)

• Conclusions

A basic NN training procedure

Training Example

Prediction

Training Example

True Label

Prediction

Training Example

True Label

Prediction

Compute Loss

Training Example

True Label

Prediction

Compute Loss

Backpropagate loss

Adversarial Examples

That’s problematic. How can we solve it?

Adversarial Training for NN’s

Training Example

Perturbed Example

Training Example

Perturbed Example

Training Example

Perturbed Example

Training Example

Prediction

Perturbed Example

Training Example

True Label

Prediction

Perturbed Example

“panda”

Training Example

True Label

Prediction

Compute Loss Function

Perturbed Example

“panda”

Training Example

True Label

Prediction

Compute Loss Function

Backpropagate loss

Perturbed Example

“panda”

• The general addition to the cost function for adversarial training:

• In practice, take a change depending on the gradient :

Virtual Adversarial Training

• Extends adversarial training to the semi-supervised regime

• The key idea - make the output distribution for an original and perturbrated example close to each other

• Enables the use of large amounts of unlabeled data

Virtual Adversarial Training for NN’s

Unlabeled Example

Perturbed Example

Unlabeled Example

Perturbed Example

Unlabeled Example

Perturbed Example

Prediction Distribution Prediction Distribution

Unlabeled Example

Perturbed Example

Measure KL Divergence-based loss

Unlabeled Example

Backpropagate loss

Perturbed Example

Measure KL Divergence-based loss

• The general addition to the cost function for virtual adversarial training:

• Again, in practice, there is an efficient way to approximate this (as detailed in Miyato et. al., 2016)

Model - Adversarial Training for Text Classification

• Adversarial perturbations typically consist of making small modifications to very many real-valued inputs (i.e. pixels in the previous examples)

• For text classification, the input is discrete, and usually represented as a series of high-dimensional one-hot vectors (where such small modifications are impossible).

• Solution: define the perturbation on continuous word embeddings instead of discrete word inputs.

Model - Adversarial Training for Text Classification

• The perturbation is introduced to normalized embeddings to avoid the network from learning to ignore them:

• As we model the input text as:

• The perturbation is defined as:

• And the addition to the loss function is:

Adversarial Training for Text Classification

Virtual Adversarial Training for Text Classification

• Here, the perturbation is defined as:

• And the addition to the loss function is then:

Experimental Settings• 5 datasets:

• Sentiment classification (binary): IMDB, Rotten Tomatoes, Elec

• Topic classification (multiclass): DBpedia, RCV1

• Treat punctuation as spaces

• Convert words to lower case

• Remove words which appear in only one document

• RCV1 - remove stop words

Experimental Settings - Preprocessing

Pre-Training Tricks and Hyperparams

• Initialize word embeddings and LSTM weights with RNNLM on labeled and unlabeled examples

• Single layer LSTM, 1024 units (512 for BiLSTM)

• Embedding size: 256(IMDB, BiLSTM)/512(Rest)

• Sampled softmax loss with 1024 candidate samples (?)

• Adam optimization, 256 samples per batch

• 0.5 dropout rate on the word embeddings

Classification Model Tricks and Hyperparams

• 1 Hidden layer before softmax, 30(IMDB, Elec, Rotten)/128(Rest) units

• ReLU activation function

• batch size - 64(IMDB, Elec, RCV1)/128(Rest)

• 10k-20k training steps for each model

• Truncated back propagation - stop back propagating after 400 steps

• Generate perturbation after dropout

• Optimize epsilon, dropout rate on validation set

Results - IMDB• Adversarial and virtual adversarial training show lower

negative log-likelihood

• Virtual adversarial training also improves the adversarial training loss

Results - IMDB

Embedding-Based Analysis

Results - Topic classifiction: Elec, RCV1

• Improved SOTA on Elec, RCV1, without using CNN’s

Results - Sentiment analysis: Rotten Tomatoes

• Adversarial+Virtual adv. performs equally to SOTA

• Virtual adversarial is weaker than baseline - could be due to small amount of supervised examples, short sentences

Results - DBpedia• Baseline itself improves over SOTA, virtual

adversarial performs best

Related Work

• Dropout/Random Noise

• Generative Models

• Pre-Training as semi-supervised learning

Conclusion

• Adversarial and virtual adversarial training provides good regularization performance for text classification with RNN’s

• Provides SOTA results or on-par results for the examined datasets

• “Improved Quality” of word embeddings

Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised...

Documents

Adversarial Robustness of Supervised Sparse Coding

Adversarial Dropout for Supervised and Semi-Supervised ... · (Hinton et al. 2012; Srivastava et al. 2014). ... Szegedy et al. (2013) showed that a cer-tain neural network is vulnerable

SeqVAT: Virtual Adversarial Training for Semi-Supervised ...training (VAT). “Virtual” means label informa-tion is not required in this new adversarial train-ing approach and consequently

Triangle Generative Adversarial Networkspeople.ee.duke.edu/~lcarin/TriangleGAN.pdfA Triangle Generative Adversarial Network ( -GAN) is developed for semi-supervised cross-domain joint

Adversarial Dropout for Supervised and Semi-Supervised Learning … · 2017-09-19 · Adversarial Dropout for Supervised and Semi-Supervised Learning Sungrae Park and Jun-Keon Park

Weakly Supervised Adversarial Domain Adaptation for ...static.tongtianta.site/paper_pdf/25bfc95e-6e7b-11e9-944d-00163e08bb86.pdfoutputs objects category and domain class. Similar to

Weakly-Supervised Concept-based Adversarial Learning for ... · l word-level discriminator, cl concept-level discriminator. Because the sub-distribution conditioned on the concept

Self-Supervised Adversarial Hashing Networks for Cross ...openaccess.thecvf.com/content_cvpr_2018/papers/Li_Self...Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

arXiv:1507.00677v9 [stat.ML] 11 Jun 2016 · Published as a conference paper at ICLR 2016 DISTRIBUTIONAL SMOOTHING WITH VIRTUAL ADVERSARIAL TRAINING Takeru Miyato 1, Shin-ichi Maeda

Adversarial Training Methods for Semi-Supervised Text ... · Adversarial Training Methods for Semi-Supervised Text Classification Takeru Miyato(†)(Kyoto University, Google Brain),

Weakly Supervised Adversarial Learning for 3D Human Pose ...humanmotion.ict.ac.cn/papers/2019P7_POSE/paper.pdf · weakly supervised adversarial learning framework for 3D human pose

Semi-supervised adversarial audio source

arXiv:1912.03283v2 [quant-ph] 5 Apr 2020 · quantum machine learning; adversarial examples; Support Vector Machine; quantum theory. I. INTRODUCTION Supervised learning is one of the

Adversarial Self-Supervised Learning for Semi-Supervised 3D … · 2020. 8. 7. · Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition Chenyang Si 1;23[0000

arXiv:1802.05957v1 [cs.LG] 16 Feb 2018Published as a conference paper at ICLR 2018 SPECTRAL NORMALIZATION FOR GENERATIVE ADVERSARIAL NETWORKS Takeru Miyato 1, Toshiki Kataoka , Masanori

FRENCH ET AL.: SEMI-SUPERVISED SEMANTIC …arXiv:1906.01916v5 [cs.CV] 11 Aug 2020 {Laine and Aila} 2017 {Miyato, Maeda, Koyama, and Ishii} 2017 {Oliver, Odena, Raffel, Cubuk, and Goodfellow}

Self-Supervised Vessel Segmentation via Adversarial Learning

Adversarial Robustness: From Self-Supervised Pre-Training ... · Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning Tianlong Chen1, Sijia Liu2, Shiyu Chang2,

Semi-supervised Adversarial Learning to Generate Photorealistic … · 2018. 8. 28. · Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities

Adversarial Constraint Learning for Structured Prediction · 2018. 7. 4. · 2.2 Constraint-Based Learning Collecting a large labeled dataset for supervised learning can often be