ICML Talk on deep learning for music recommendation

Deeplearningformusicrecommendationand

personalizedradiostations

Aloïs GRUSON

niland.io @nilandmusic

Can we recommend music with a pure content-based approach ?

Question

Content based music recommendation

?

Embedding space

Audiomusicsignal Processingandmodeling

Closeinembedding spaceó can be recommended together

Evaluation metrics

§ One of our metrics : Precision @50 on a dataset of scrapped playlists of 8083 tracks classified in 142 playlists.

§ Perceptive evaluations with real users showed correlation between this metric and the users average rating

Our results at niland.io

0

2

4

6

8

10

12

14

16

18

2011 2012 2013 2014 2015 2016

precision@50

ClassicApproaches DeepLearning

Mirex 2011 Ranked 1st Submission

+ 66.8% relative improvement

Audio

MFCC

SFM

OC

GMM-SV

GMM-SV

GMM-SV

Spectrogram

Res GMM-SV

0

5

10

15

20

2011

2012

2013

2014

2015

2016

precision@50

MIREX 2011 Ranked 1st Submission

2011 4000

0

5

10

15

20

2011

2012

2013

2014

2015

2016

precision@50

AudioMFCC

SFM

OC

GMM-SV

GMM-SV

GMM-SV

SpectrogramRes GMM-SV

Gabor GMM-SV

HoG GMM-SV

Work on more descriptors

2014 9000

“Bridge the semantic gap” ?

• We worked to bring the human perception of similarity into our model

• We train deep neural networks to classify songs into playlists.

• And we remove the classification layer to get our embedding space

• Our training set : 115.412 tracks in 3032 playlists

0

5

10

15

20

2011

2012

2013

2014

2015

2016

precision@50

AudioMFCC

SFM

OC

GMM-SV

GMM-SV

GMM-SV

SpectrogramRes GMM-SV

Gabor GMM-SV

HoG GMM-SV

Bending the space

2015

DNN

1000

9000

0

5

10

15

20

2011

2012

2013

2014

2015

2016

precision@50

ConvolutionalNet

2016

Audio Spectrogram

Learning the low-level features

1000

An example of CNN structure

• 1DConvolutions• GlobalTemporalPoolingLayer:Mean+Max+Variance• 2fullyconnectedlayers+classificationlayer• ResidualConnections


GlobalTemporalPoolingLayer: Mean+Max+Variance• Allowstoprocessvariablelength tracks• Generatesometemporalinvariance


Ourbestsystemhas:• 1FrequencyConvolution layer• 15residualblocks,with5convolution layersineach• Aglobalpooling layer:Mean+Max+Variance• 2fullyconnectedlayers(2000+1000)

How to generate personalized radio stations ?

We havethis embedding space,andwe can recommend tracks foragiven track.

Howdowe create apersonalized radiostationforanuser?

Let you discover music you like

Understand your various tastes

What do you want to listen to right now ?

Fast convergence into the wanted music style

Scarlett.fm : our streaming app

http://scarlett.fm

1M tracks from soundcloud.com

Pure content-based recommendations

Conclusion

• Averyeffectivewaytoincorporatehumanknowledgeintoanacousticmodel

• What’snext?Ø GeneratingmusicØ UsingrawaudioasaninputØ Morediversity/riskinradiostations

Technology

ICML Talk on deep learning for music recommendation