A small footprint foraudio and music classification
Hamid Eghbal-zadeh
1
Outline
1. Introduction
2. I-Vector representation
3. Some results
4. Conclusion
2
INTRODUCTION
3
A small footprint for Audio and Music classification
4
đ1đ2
đđ
.
.
.
Audio Acoustic features Front-end Small footprint Classifier
o Front-end:⢠Block-level features (Genre classification) [Seyerlehner,2010]⢠Adapted GMM means (Genre classification) [Charbuillet,2011]⢠Adapted RBM weights (Speaker verification) [Ghahabi,2014]⢠Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Machine learning
5
⢠Block-level features (Genre classification) [Seyerlehner,2010]⢠Adapted GMM means (Genre classification) [Charbuillet,2011]⢠Adapted RBM weights (Speaker verification) [Ghahabi,2014]⢠Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model(UBM)
Train db
+
UBM Adaptation Adapted UBM params
Classifier
Train
Test db
+
UBM Adaptation Adapted UBM params
Classifier
Test
Train
6
⢠Block-level features (Genre classification) [Seyerlehner,2010]⢠Adapted GMM means (Genre classification) [Charbuillet,2011]⢠Adapted RBM weights (Speaker verification) [Ghahabi,2014]⢠Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model(UBM)
Train db
+
UBM AdaptationAdapted UBM
paramsClassifier
Train
Test db
+
UBM Adaptation Adapted UBM params
Classifier
Test
Train
Train
Test
Factor analysis
Factor analysis
Effect of Factor Analysis step
7
An example of songs in GTZAN dataset from 3 genres [Eghbal-zadeh, ISMIR2015]:Right: without Factor AnalysisLeft: With Factor Analysis
Artist recognition performance on Artist20 with and Without Factor Analysis [Eghbal-zadeh, Eusipco2015]
Without FA
With FA
8
Other benefits:
⢠Noise-Robust features [Eghbal-zadeh,ISMIR2016]
⢠Combined with Neural Nets [Eghbal-zadeh, DAFx2016]
⢠Successfully used in different tasks:⢠Speaker verification⢠Language recognition⢠Artist recognition⢠Music similarity⢠Audio scene classification
Why to apply Factor Analysis?
⢠They provide an information-rich, fixed-length, low-dimensional representation
⢠They have a single-Gaussian distribution⢠We can use the properties of Gaussians
⢠They can be easily scored⢠Using cosine distance
⢠They are the estimated latent factors with a good discrimination power resulted from a Factor Analysis procedure
9
I-VECTOR
REPRESENTATION AS
A SMALLFOOTPRINT
10
11
⢠Block-level features (Genre classification) [Seyerlehner,2010]⢠Adapted GMM means (Genre classification) [Charbuillet,2011]⢠Adapted RBM weights (Speaker verification) [Ghahabi,2014]⢠Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db UBM (GMM)
Train db
+
UBMAdapted GMM params
(statistical representation)Classifier
Train
Test db
+
UBM Classifier
Test
Train
Train
Test
Factor analysis
Factor analysisAdapted GMM params
(statistical representation)
12
Different Factor Analysis approaches:
Adapted GMM mean
UBM mean
Eigenvoice subspace
Hidden vectorM = m + V y
Adapted GMM mean
UBM mean
Song subspace
residualM = m + Vy + Ux + Dz
Artist subspace
Adapted GMM mean
UBM mean
Low-rank matrix model both artist and song together
Hidden vector(i-vector)
M = m + T y
Eigenvoice FA:
Joint Factor Analysis (JFA):
I-vector FA:
13
An example of i-vector based systems
{I-vector extraction}{Cosine score,âŚ}{MFCC}
Extractfeatures
Computestatistics
Extract i-vectors
Post-Processing
{LDA/WCCN/âŚ}
feat
ure
s
Classification
14
Within-Class Covariance Normalization
Averaged i-vectors for class c
đđĄâ i-vectors from class c
Number of i-vectors from class cNumber of classes
WCCN projection matrix
Within-class covariance matrix
15
Within-Class Covariance Normalization
Class B
Class A
WCCN projection
The within-class variabilityIs reduced
Some results
16
⢠Audio Scene Classification
â DCASE-2016 challenge
â 15 different scenes (30 sec audios from: train, tram, office, outdoor, etcâŚ)
â We won the challenge!!!
⢠Music Similarity
â GTZAN and 1517Artists
â Eval using genre
⢠Music Artist Recognition
â Artist20 and MSD
â Noise-robust MAR using 12 different kinds and levels of noise
17
Tasks
⢠Our approach: an i-vector DNN hybrid (4 submissions Among 49 participants)
â 1st place: hybrid
â 2nd place: i-vector
â 5th place: i-vector
â 14th place: DNN
18
Audio Scene Classification Challenge (đđđđđ â đđđđ[đ])
[1] http://www.cs.tut.fi/sgn/arg/dcase2016/
⢠UBM trained on 1517Artists db, tested on GTZAN
⢠I-vectors are extracted unsupervised
⢠Evaluated with genre labels
19
Music Similarity [ISMIR-2015]
⢠Artist20 dbâ 20 artists
â 1413 songs
20
Music Artist Recognition [Eusipco-2015]
⢠MSD dbâ 50 Artists
â 5,000 songs
21
Music Artist Recognition [DAFx-2016]
CDB-Net
Experiment 2 â Raw i-vectors
⢠Artist20 dbâ 4 different noises :
⢠festival noise
⢠humming noise
⢠pink noise
⢠PUB noise
â 3 different SNR levels
22
Noise-Robust Music Artist Recognition [ISMIR-2016]
Conclusion
23
Conclusion:
⢠A small footprint using FA
⢠Useful for different audio and music related tasks
⢠Robustness against noise
⢠Useful as Neural Net features
24
Thank you
for your time
25