Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
DCASE 2016CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION
Michele Valenti1 ([email protected]),
Aleksandr Diment2, Giambattista Parascandolo2,
Stefano Squartini1, Tuomas Virtanen2
1Università Politecnica delle Marche, Italy2Tampere University of Technology, Finland
DCASE 2016CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION
Michele Valenti1 ([email protected]),
Aleksandr Diment2, Giambattista Parascandolo2,
Stefano Squartini1, Tuomas Virtanen2
1Università Politecnica delle Marche, Italy2Tampere University of Technology, Finland
Outline
• Introduction
• Our system
•Training modes
• Results
• Challenge ranking
IntroductionWhat is “acoustic scene classification”?
Introduction
ForestpathCarHome
Audio
What is “acoustic scene classification”?
Our system
Audio Featureextraction
Sequence splitting
LabelCNN
Scores averaging
Overview
Our system
Log-mel spectrogramRaw audio
Features
AudioFeatures
Our system
Log-mel spectrogramRaw audio segment
Sequence
Sequence splitting
Sequence splitting
Features
Our system
Sequence
Convolutional neural network
Our system
SequenceFeature maps
128
Convolutional neural network
CNNSequences
Our system
SequenceFeature maps
128
Convolutional neural network
CNNSequences
Batch normalization
Our system
Sequence
128 128
Feature maps Subsampled feature maps
Convolutional neural network
CNNSequences
Our system
Feature mapsSequence
Subsampled feature maps
128 128 256
Newfeature maps
Convolutional neural network
CNNSequences
Our system
Sequence
256128 128
Feature maps Subsampled feature maps
Newfeature maps
Time shrinking
Convolutional neural network
CNNSequences
Our system
Sequence
256128 128
Flattening
Feature maps Subsampled feature maps
Newfeature maps
Convolutional neural network
CNNSequences
Our system
Sequence
256128 128
Fully-connected softmax layer
Feature maps Subsampled feature maps
Newfeature maps
Convolutional neural network
CNNSequences
Our system
Sequence
256128 128
Feature maps Subsampled feature maps
Newfeature maps
Convolutional neural network
CNNSequences
Our systemScores averaging
Class prediction scores
Scores averaging
Predictionscores
Our systemScores averaging
Class prediction scores
File’s class
!"Σargmax
Scores averaging
Predictionscores
Training
TrainingCross-validation setup
Test
Training + validation Test
Test
Test
Fold 1
Fold 2
Fold 3
Fold 4
TrainingNon-full training
Training
Validation
Training + validation Test
Fold n
TrainingNon-full training
Training
Validation
Training + validation Test
Fold n
Non-full training
Training Non-full training
Training
Validation
Acc
urac
ies
Epochs
Training
Training + validation Test
Fold n
Validation
Training Non-full training
Training
Validation
Acc
urac
ies
Epochs
Training
Training + validation Test
Fold n
Validation
Convergence time
Training Non-full training
Training
Validation
Training + validation Test
Fold n
Training
Training Non-full training
Training
Validation
Training + validation Test
Fold n
Training
Full training
Results
Test
Training + validation Test
Test
Test
Fold 1
Fold 2
Fold 3
Fold 4
Test data
ResultsSequence length
65
70
75
80
0,5 1,5 3 5 10 30
Acc
urac
y (%
)
Sequence length (s)
Non-full training Full training
ResultsSequence length
65
70
75
80
0,5 1,5 3 5 10 30
Acc
urac
y (%
)
Sequence length (s)
Non-full training Full training
ResultsSequence length
65
70
75
80
0,5 1,5 3 5 10 30
Acc
urac
y (%
)
Sequence length (s)
Non-full training Full training
ResultsClass accuracies
Class Accuracy (%)Beach 75.6
Bus 76.9
Café/Restaurant 74.4
Car 91.0
City center 93.6
Forest path 96.2
Grocery store 88.5
Home 80.8
Class Accuracy (%)Library 66.6
Metro station 96.2
Office 97.4
Park 59.0
Residential area 73.1
Train 46.2
Tram 78.2
Class Accuracy (%)Beach 75.6
Bus 76.9
Café/Restaurant 74.4
Car 91.0
City center 93.6
Forest path 96.2
Grocery store 88.5
Home 80.8
Class Accuracy(%)Library 66.6
Metrostation 96.2Office 97.4Park 59.0
Residentialarea 73.1Train 46.2Tram 78.2
ResultsClass accuracies
34.6% Residential area
29.5% Bus
ResultsOther classifiers
SystemSequence length (s)
Accuracy (%)
Non-full training Full training
Baseline GMM (MFCC) - - 72.6
Two-layer CNN (MFCC) 5 67.7 72.6
Two-layer MLP (log-mel) - 66.6 69.3
One-layer CNN (log-mel) 3 70.3 74.8
Two-layer CNN (log-mel) 3 75.9 79.0
Challenge rankingFinal training
Training + validation + test Secret challenge data
Extended training set Evaluation set
Challenge rankingFinal training
Training + validation + test
Extended training set Evaluation set
New training New validation
Secret challenge data
Challenge rankingFinal training
Training + validation + test
Extended training set Evaluation set
New training New validation
Secret challenge data
400 epochsconvergence
Challenge rankingFinal training
Training + validation + test
Extended training set Evaluation set
Final training for 400 epochs
Secret challenge data
Challenge ranking89,7 88,7 87,7 87,2 86,4 86,4 86,2 85,9 85,6 85,4 84,6 84,1
77,2
62,8
0
10
20
30
40
50
60
70
80
90
100
DCASE 2016CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION
Michele Valenti1 ([email protected]),
Aleksandr Diment2, Giambattista Parascandolo2,
Stefano Squartini1, Tuomas Virtanen2
1Università Politecnica delle Marche, Italy2Tampere University of Technology, Finland
ResultsFeature comparison
SystemSequence length (s)
Accuracy (%)Non-full training Full training
Two-layer CNN (MFCC) 5 67.7 72.6
Two-layer CNN (log-mel) 5 74.1 78.3