Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Special Session: Deep Learning for Computational Paralinguistics
Towards Conditional Adversarial Training for Predicting Emotions from Speech
Jing Han1, Zixing Zhang2, Zhao Ren1, Fabien Ringeval3, Björn Schuller1,2
1. University of Augsburg, Germany2. Imperial College London, UK3. Université Grenoble Alpes, France
Motivation
Adversarial training (also called GAN for Generative Adversarial Networks), and the variations that are now being proposed is the most interesting idea in the last 10 years in ML.
-- Yann LeCun, Director of AI Research at Facebook
April 18, 2018, ICASSP @ Calgary Jing Han
Donahue, Chris, et al. "Semantically decomposing the latent spaces of generative adversarial networks." ICLR 2018.Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." ICLR 2017.
Motivation
April 18, 2018, ICASSP @ Calgary Jing Han
Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." CVPR 2017.Antipov, Grigory, Moez Baccouche, and Jean-Luc Dugelay. "Face aging with conditional generative adversarial networks." ICIP 2017.
Adversarial Training
April 18, 2018, ICASSP @ Calgary Jing Han
Generator DiscriminatorLoss
Real
Fake
Real
Latent random variable z
generated samples
The Discriminator tries to distinguish between the generated (fake) and real data.
The Generator tries to turn the input noise into fake data to try to fool the discriminator.
Conditional Adversarial Training
April 18, 2018, ICASSP @ Calgary Jing Han
Generator DiscriminatorLoss
Real
Fake
Real
Latent random variable z
generated samples
The Condition y tries to control the modes of the data being generated, could be based on class labels or data from different modalities, etc.Condition y
Proposed Conditional Adversarial Training for SER
April 18, 2018, ICASSP @ Calgary Jing Han
Generator DiscriminatorLoss
Real
Fake
Real
Latent random variable z
generated samples
Condition y
Regressor
audio features
predictions
Regressor DiscriminatorLoss
Real
Fake
Real y
predictions y'
Audio features x
Proposed Conditional Adversarial Training for SER
April 18, 2018, ICASSP @ Calgary Jing Han
Condition yaudio features
Regressor DiscriminatorLoss
Real
Fake
Real y
Proposed Conditional Adversarial Training for SER
April 18, 2018, ICASSP @ Calgary Jing Han
Regressor(G) Discriminator
Loss
Real
Fake
Real y
predictions y'
Audio features x
Objective 1:
Objective 2:(wasserstein distance)
Database
● recordings of spontaneous collaborative and affective interactions in French
● partially used in AVEC 2015 and 2016
● 46 subjects (23 pairs) and 5 minutes per pair
● 16 training / 15 development / 15 test
● http://diuf.unifr.ch/diva/recola/index.html
April 18, 2018, ICASSP @ Calgary Jing Han
Experiments
● feature set: mean & var of MFCC 0-12 and log energy
● structure for both NN1 and NN2: LSTM-RNN with 2 layers, 20 nodes per layer.
● measurement: CCC (concordance correlation coefficient)
April 18, 2018, ICASSP @ Calgary Jing Han
Experimental Results
April 18, 2018, ICASSP @ Calgary Jing Han
CCCarousal valence
dev test dev test
RNN(2 layers) .777 .718 .491 .435
RNN (4 layers) .761 .723 .487 .390
conditional adversarial training .780 .732 .501 .455
conditional adversarial training with wasserstein .797 .737 .474 .444
without adversarial
proposed
Experimental Results
April 18, 2018, ICASSP @ Calgary Jing Han
CCCarousal valence
dev test dev test
CCC-objective .412 .350 .242 .199
end-to-end .752 .699 .406 .311
reconstruction-error based .785 .729 .378 .360
prediction-based .774 .744 .412 .377
conditional adversarial training .780 .732 .501 .455
conditional adversarial training with wasserstein .797 .737 .474 .444proposed
state of the art
Experimental Results
April 18, 2018, ICASSP @ Calgary Jing Han
Conclusion & Perspective
Conclusion
● adversarial training for speech emotion recognition● two variants both perform better than baseline● arousal prediction benefits more from wasserstein distance than valence
Future work
● train two models based on a loss threshold rather than fixed schedule● experiments on other dataset● perform an end-to-end structure
April 18, 2018, ICASSP @ Calgary Jing Han
Thank you for [email protected]