41
DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 ([email protected]), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of Technology, Finland

DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

DCASE 2016CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION

Michele Valenti1 ([email protected]),

Aleksandr Diment2, Giambattista Parascandolo2,

Stefano Squartini1, Tuomas Virtanen2

1Università Politecnica delle Marche, Italy2Tampere University of Technology, Finland

Page 2: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

DCASE 2016CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION

Michele Valenti1 ([email protected]),

Aleksandr Diment2, Giambattista Parascandolo2,

Stefano Squartini1, Tuomas Virtanen2

1Università Politecnica delle Marche, Italy2Tampere University of Technology, Finland

Page 3: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Outline

• Introduction

• Our system

•Training modes

• Results

• Challenge ranking

Page 4: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

IntroductionWhat is “acoustic scene classification”?

Page 5: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Introduction

ForestpathCarHome

Audio

What is “acoustic scene classification”?

Page 6: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Audio Featureextraction

Sequence splitting

LabelCNN

Scores averaging

Overview

Page 7: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Log-mel spectrogramRaw audio

Features

AudioFeatures

Page 8: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Log-mel spectrogramRaw audio segment

Sequence

Sequence splitting

Sequence splitting

Features

Page 9: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Sequence

Convolutional neural network

Page 10: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

SequenceFeature maps

128

Convolutional neural network

CNNSequences

Page 11: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

SequenceFeature maps

128

Convolutional neural network

CNNSequences

Batch normalization

Page 12: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Sequence

128 128

Feature maps Subsampled feature maps

Convolutional neural network

CNNSequences

Page 13: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Feature mapsSequence

Subsampled feature maps

128 128 256

Newfeature maps

Convolutional neural network

CNNSequences

Page 14: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Sequence

256128 128

Feature maps Subsampled feature maps

Newfeature maps

Time shrinking

Convolutional neural network

CNNSequences

Page 15: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Sequence

256128 128

Flattening

Feature maps Subsampled feature maps

Newfeature maps

Convolutional neural network

CNNSequences

Page 16: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Sequence

256128 128

Fully-connected softmax layer

Feature maps Subsampled feature maps

Newfeature maps

Convolutional neural network

CNNSequences

Page 17: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our system

Sequence

256128 128

Feature maps Subsampled feature maps

Newfeature maps

Convolutional neural network

CNNSequences

Page 18: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our systemScores averaging

Class prediction scores

Scores averaging

Predictionscores

Page 19: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Our systemScores averaging

Class prediction scores

File’s class

!"Σargmax

Scores averaging

Predictionscores

Page 20: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Training

Page 21: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

TrainingCross-validation setup

Test

Training + validation Test

Test

Test

Fold 1

Fold 2

Fold 3

Fold 4

Page 22: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

TrainingNon-full training

Training

Validation

Training + validation Test

Fold n

Page 23: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

TrainingNon-full training

Training

Validation

Training + validation Test

Fold n

Non-full training

Page 24: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Training Non-full training

Training

Validation

Acc

urac

ies

Epochs

Training

Training + validation Test

Fold n

Validation

Page 25: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Training Non-full training

Training

Validation

Acc

urac

ies

Epochs

Training

Training + validation Test

Fold n

Validation

Convergence time

Page 26: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Training Non-full training

Training

Validation

Training + validation Test

Fold n

Training

Page 27: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Training Non-full training

Training

Validation

Training + validation Test

Fold n

Training

Full training

Page 28: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Results

Test

Training + validation Test

Test

Test

Fold 1

Fold 2

Fold 3

Fold 4

Test data

Page 29: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

ResultsSequence length

65

70

75

80

0,5 1,5 3 5 10 30

Acc

urac

y (%

)

Sequence length (s)

Non-full training Full training

Page 30: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

ResultsSequence length

65

70

75

80

0,5 1,5 3 5 10 30

Acc

urac

y (%

)

Sequence length (s)

Non-full training Full training

Page 31: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

ResultsSequence length

65

70

75

80

0,5 1,5 3 5 10 30

Acc

urac

y (%

)

Sequence length (s)

Non-full training Full training

Page 32: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

ResultsClass accuracies

Class Accuracy (%)Beach 75.6

Bus 76.9

Café/Restaurant 74.4

Car 91.0

City center 93.6

Forest path 96.2

Grocery store 88.5

Home 80.8

Class Accuracy (%)Library 66.6

Metro station 96.2

Office 97.4

Park 59.0

Residential area 73.1

Train 46.2

Tram 78.2

Page 33: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Class Accuracy (%)Beach 75.6

Bus 76.9

Café/Restaurant 74.4

Car 91.0

City center 93.6

Forest path 96.2

Grocery store 88.5

Home 80.8

Class Accuracy(%)Library 66.6

Metrostation 96.2Office 97.4Park 59.0

Residentialarea 73.1Train 46.2Tram 78.2

ResultsClass accuracies

34.6% Residential area

29.5% Bus

Page 34: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

ResultsOther classifiers

SystemSequence length (s)

Accuracy (%)

Non-full training Full training

Baseline GMM (MFCC) - - 72.6

Two-layer CNN (MFCC) 5 67.7 72.6

Two-layer MLP (log-mel) - 66.6 69.3

One-layer CNN (log-mel) 3 70.3 74.8

Two-layer CNN (log-mel) 3 75.9 79.0

Page 35: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Challenge rankingFinal training

Training + validation + test Secret challenge data

Extended training set Evaluation set

Page 36: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Challenge rankingFinal training

Training + validation + test

Extended training set Evaluation set

New training New validation

Secret challenge data

Page 37: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Challenge rankingFinal training

Training + validation + test

Extended training set Evaluation set

New training New validation

Secret challenge data

400 epochsconvergence

Page 38: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Challenge rankingFinal training

Training + validation + test

Extended training set Evaluation set

Final training for 400 epochs

Secret challenge data

Page 39: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

Challenge ranking89,7 88,7 87,7 87,2 86,4 86,4 86,2 85,9 85,6 85,4 84,6 84,1

77,2

62,8

0

10

20

30

40

50

60

70

80

90

100

Page 40: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

DCASE 2016CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION

Michele Valenti1 ([email protected]),

Aleksandr Diment2, Giambattista Parascandolo2,

Stefano Squartini1, Tuomas Virtanen2

1Università Politecnica delle Marche, Italy2Tampere University of Technology, Finland

Page 41: DCASE2016Workshop1Università Politecnica delle Marche, Italy 2Tampere University of Technology, Finland Results Feature comparison System Sequence length (s) Accuracy (%) Non-full

ResultsFeature comparison

SystemSequence length (s)

Accuracy (%)Non-full training Full training

Two-layer CNN (MFCC) 5 67.7 72.6

Two-layer CNN (log-mel) 5 74.1 78.3