Upload
alois-gruson
View
267
Download
4
Embed Size (px)
Citation preview
Deeplearningformusicrecommendationand
personalizedradiostations
Aloïs GRUSON
niland.io @nilandmusic
Can we recommend music with a pure content-based approach ?
Question
Content based music recommendation
?
Embedding space
Audiomusicsignal Processingandmodeling
Closeinembedding spaceó can be recommended together
Evaluation metrics
§ One of our metrics : Precision @50 on a dataset of scrapped playlists of 8083 tracks classified in 142 playlists.
§ Perceptive evaluations with real users showed correlation between this metric and the users average rating
Our results at niland.io
0
2
4
6
8
10
12
14
16
18
2011 2012 2013 2014 2015 2016
precision@50
ClassicApproaches DeepLearning
Mirex 2011 Ranked 1st Submission
+ 66.8% relative improvement
Audio
MFCC
SFM
OC
GMM-SV
GMM-SV
GMM-SV
Spectrogram
Res GMM-SV
0
5
10
15
20
2011
2012
2013
2014
2015
2016
precision@50
MIREX 2011 Ranked 1st Submission
2011 4000
0
5
10
15
20
2011
2012
2013
2014
2015
2016
precision@50
AudioMFCC
SFM
OC
GMM-SV
GMM-SV
GMM-SV
SpectrogramRes GMM-SV
Gabor GMM-SV
HoG GMM-SV
Work on more descriptors
2014 9000
“Bridge the semantic gap” ?
• We worked to bring the human perception of similarity into our model
• We train deep neural networks to classify songs into playlists.
• And we remove the classification layer to get our embedding space
• Our training set : 115.412 tracks in 3032 playlists
0
5
10
15
20
2011
2012
2013
2014
2015
2016
precision@50
AudioMFCC
SFM
OC
GMM-SV
GMM-SV
GMM-SV
SpectrogramRes GMM-SV
Gabor GMM-SV
HoG GMM-SV
Bending the space
2015
DNN
1000
9000
0
5
10
15
20
2011
2012
2013
2014
2015
2016
precision@50
ConvolutionalNet
2016
Audio Spectrogram
Learning the low-level features
1000
An example of CNN structure
• 1DConvolutions• GlobalTemporalPoolingLayer:Mean+Max+Variance• 2fullyconnectedlayers+classificationlayer• ResidualConnections
An example of CNN structure
GlobalTemporalPoolingLayer: Mean+Max+Variance• Allowstoprocessvariablelength tracks• Generatesometemporalinvariance
An example of CNN structure
Ourbestsystemhas:• 1FrequencyConvolution layer• 15residualblocks,with5convolution layersineach• Aglobalpooling layer:Mean+Max+Variance• 2fullyconnectedlayers(2000+1000)
How to generate personalized radio stations ?
We havethis embedding space,andwe can recommend tracks foragiven track.
Howdowe create apersonalized radiostationforanuser?
Let you discover music you like
Understand your various tastes
What do you want to listen to right now ?
Fast convergence into the wanted music style
Scarlett.fm : our streaming app
http://scarlett.fm
1M tracks from soundcloud.com
Pure content-based recommendations
Conclusion
• Averyeffectivewaytoincorporatehumanknowledgeintoanacousticmodel
• What’snext?Ø GeneratingmusicØ UsingrawaudioasaninputØ Morediversity/riskinradiostations