Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
MUSIC SEARCH AND
RECOMMENDATION
Gert Lanckriet
LA ML Meetup Santa MonicaMarch 8th, 2012
mellow rock with female vocals Search
... from millions of songs
Music getting big...
Music Search and Recommendation
The right music to the right
people at the right time/place
Music Search and Recommendation
100M+sold
300M+sold
The right music to the right
people at the right time/place
Short Head and Long Tailp
op
ula
rity
songs
Short Head - Popular
Long Tail - Obscure
Music Search and Recommendation
100M+sold
300M+sold
The right music to the right
people at the right time/place
Music Search and Recommendation
in the gym
happy rock
The right music to the right
people at the right time/place
Music Search and Recommendation
The right music to the right
people at the right time/place
in the gym
happy rockdating
romantic jazz
Music Search and Recommendation
The right music to the right
people at the right time/place
in the gym
happy rockdating
romantic jazz
Halloween
scary
Music Search and Recommendation
The right music to the right
people at the right time/place
Computers Analyzing Music
MusicAuto
Tagger
Computers Annotating Music
genre
rock
jazz
rock
jazz
Computers Annotating Music
genre
mood
rock
jazz
rom
antic
hap
pysc
ary
Computers Annotating Music
genre
mood
instrument
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
Computers Annotating Music
genre
mood
instrument
usage
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e
Computers Annotating Music
genre
mood
instrument
usage
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e
Ray Charles -
“There is no you”
Computers Annotating Music
genre
mood
instrument
usage
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e
✓ ✓ ✓ ✓ Ray Charles -
“There is no you”
Computers Annotating Music
Computers Annotating Music
Computers Annotating Music
genre
mood
instrument
usage
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e
✓ ✓ ✓ ✓ 00000001
Computers Annotating Music
genre
mood
instrument
usage
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e
00000002 ✓ ✓ ✓ ✓
00000001 ✓ ✓ ✓ ✓
Computers Annotating Music
genre
mood
instrument
usage
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e
00000003 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ 00000002
00000001
✓ ✓ ✓ ✓
Computers Annotating Music
genre
mood
instrument
usage
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e
00000004 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ 00000003
00000002
00000001
Computers Annotating Music
genre
mood
instrument
usage
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e
18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
Finding the Right Music
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
happy rock for the gym
Search
Finding the Right Music
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
happy rock for the gym
Search
Finding the Right Music
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
happy rock for the gym
Search
Finding the Right Music
Search Engine
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
Search
00000001
seed song
Finding the Right Music
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
Search
00000001
seed song
Finding the Right Music
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
Search
00000001
seed song
Finding the Right Music
Markovoni radio
Pandora and iTunes’ Genius
• Pandora: limited to songs
manually indexed
• Genius: not for relatively
unknown music
• CALab: any song
artist name / song title
Search
Search
00000001
happy rock for the gym
Search Search
00000001
Pandora and iTunes’ Genius
• Pandora: limited to songs
manually indexed
• Genius: not for relatively
unknown music
• CALab: any song
artist name / song title
Search
Search
00000001
happy rock for the gym
Search Search
00000001
Pandora and iTunes’ Genius
• Pandora: limited to songs
manually indexed
• Genius: not for relatively
unknown music
• CALab: any song
artist name / song title
Search
Search
00000001
happy rock for the gym
Search Search
00000001
Collaborative filtering
‣ Explicit feedback ( )
‣ Implicit feedback (# listens)
Collaborative filtering
The cold start problem
No feedback = no recommendation
The cold start problem
‣ Use audio features to bypass cold-start
Pandora and iTunes’ Genius
• Pandora: limited to songs
manually indexed
• Genius: not for relatively
unknown music
• CALab: any song
artist name / song title
Search
Search
00000001
happy rock for the gym
Search Search
00000001with mp3
Pandora and iTunes’ Genius
• Pandora: limited to songs
manually indexed
• Genius: not for relatively
unknown music
• CALab: any song
Competitive
with mp3
Computers Annotating Music
MusicAuto
Tagger
genre
mood
instrument
usage
Computers Annotating Music
signal processing
machine learning
Computers Annotating Music
songsrock
Computers Annotating Music
rock =
MusicAuto
Tagger
Computers Annotating Music
rock
MusicAuto
Tagger
Computers Annotating Music
rock ✓ rock
p( | )
Computers Annotating Music
• Given: training set of songs associated with tag t
• Estimate: p(x|t) - distribution of audio content x for
each tag t (i.e., for songs associated with the tag)
rock
Modeling Audio Content of a Song
• Segment audio signal
• Extract feature vector from each short-time segment
• Estimate model with EM algorithm
+
+
+
+
+++ ++
++
+
+++++
+
+
+
++
++
+ +++
++
+
+
+
++
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
+
+
++
+ ++
+
+
+
+
+
+
++
+
++++
+
+
+
+
+
+++
+
++
+
+
+
+
+
+++
+
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+++
+
++
+
+
++
+++
+
++
+
+
+
+
+
+
+
++
+
+ +
+ +++
++
++
+
+
+
+++
+
++
+ +
+
+
+
+
+
++
++
+ +
+ +
+
++
+
+
+ +
+
++
++
+
++
+ +
+
+
+
+
+
+ +
+ +++
++
++
+
++
+
+
EM
Delta-MFCC
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+++
+
++
+ +
+
+
+++
+
++
+ +
+
+++
+
+++
+
++
+ +
+
+
+
++
+
+
+
+
+ +
++
+
+
+
+++
+
++
+ +
+ ++
++
+
+
+ ++
+
++
+
++
++
+
++
+
+
+
+
+
++
++
+
++
+
+
+
+
+
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
++++
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
++ ++
+ +
+
+
+
++
+ ++++
++
+ +
+
+
+
+
++
+
+++
+
++
+ +
+
+
+
++
+
+
+
+ +
+ +
x xx
x x
x x
x xx
x x
x x
x xx
x x
x x
x xx
x x
x x
x xx
x x
x x
x xx
x x
x x
x xx
x x
x x
x xx
x x
x xx xx
x x
x x xx
xx xx xx
Modeling Audio Content of Tag: p(x|t)
• Identify songs associated with tag t
• Merge all features
• Estimate p(x|t)
Tag Model
p(x|t = Rock)
Songs
+
++
+++
++
+
+
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
+++
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
+
+
++
+++
++
++
+
++
+++
++
+
+
++
+++
++
++
+
++
+++
++
+
+
++
+++
++
++
+
++
+++
++
+++
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
++
+
++
+++
++
+++
++
+++
++
++
Rock
“EM”
Annotating a Song
Annotating a Song
• Audio clips represented as a “bag” of feature vectors
extracted from 20-80 msec frames
• Each vector sampled independently from GMM
+
+
+
+
+++ ++
++
+
+++++
+
+
+
++
++
+ +++
++
+
+
+
++
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
+
+
++
+ ++
+
+
+
+
+
+
++
+
++++
+
+
+
+
+
+++
+
++
+
+
+
+
+
+++
+
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+++
+
++
+
+
++
+++
+
++
+
+
+
+
+
+
+
++
+
+ +
+ +++
++
++
+
+
+
+++
+
++
+ +
+
+
+
+
+
++
++
+ +
+ +
+
++
+
+
+ +
+
++
++
+
++
+ +
+
+
+
+
+
+ +
+ +++
++
++
+
++
+
+
EM
Delta-MFCC
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+++
+
++
+ +
+
+
+++
+
++
+ +
+
+++
+
+++
+
++
+ +
+
+
+
++
+
+
+
+
+ +
++
+
+
+
+++
+
++
+ +
+ ++
++
+
+
+ ++
+
++
+
++
++
+
++
+
+
+
+
+
++
++
+
++
+
+
+
+
+
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
++++
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
++ ++
+ +
+
+
+
++
+ ++++
++
+ +
+
+
+
+
++
+
+++
+
++
+ +
+
+
+
++
+
+
+
+ +
+ +
p(x|t) - Gaussian Mixture Models
GMM
+
+
+
+
+++ ++
++
+
+++++
+
+
+
++
++
+ +++
++
+
+
+
++
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
+
+
++
+ ++
+
+
+
+
+
+
++
+
++++
+
+
+
+
+
+++
+
++
+
+
+
+
+
+++
+
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+++
+
++
+
+
++
+++
+
++
+
+
+
+
+
+
+
++
+
+ +
+ +++
++
++
+
+
+
+++
+
++
+ +
+
+
+
+
+
++
++
+ +
+ +
+
++
+
+
+ +
+
++
++
+
++
+ +
+
+
+
+
+
+ +
+ +++
++
++
+
++
+
+
Delta-MFCC
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+++
+
++
+ +
+
+
+++
+
++
+ +
+
+++
+
+++
+
++
+ +
+
+
+
++
+
+
+
+
+ +
++
+
+
+
+++
+
++
+ +
+ ++
++
+
+
+ ++
+
++
+
++
++
+
++
+
+
+
+
+
++
++
+
++
+
+
+
+
+
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
++++
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
++ ++
+ +
+
+
+
++
+ ++++
++
+ +
+
+
+
+
++
+
+++
+
++
+ +
+
+
+
++
+
+
+
+ +
+ +
EM
GMM
p(x|t) - Gaussian Mixture Models
• Audio clips represented as a “bag” of feature vectors
extracted from 20-80 msec frames
• Each vector sampled independently from GMM
dynamics: temporal evolution
observation: instantaneous spectral characteristics
p(x|t) - Dynamic Texture Mixtures
• “Glue” consecutive frames together into sequences
• Model sequence as a dynamic texture (DT)
5-10 sec
dynamics: temporal evolution
observation: instantaneous spectral characteristics
• To accommodate different temporal dynamics (DTs) in
one song: dynamic texture mixture (DTM)
p(x|t) - Dynamic Texture Mixtures
• “Glue” consecutive frames together into sequences
• Model sequence as a dynamic texture (DT)
Dynamic Texture Mixtures
Song DTM
Sequence ofAudio Feature
Vectors
B. B. King - Sweet Little Angel
Dynamic Texture Mixtures
Song DTM
Sequence ofAudio Feature
Vectors
B. B. King - Sweet Little Angel
Dynamic Texture Mixtures
Song DTM
Sequence ofAudio Feature
Vectors
B. B. King - Sweet Little Angel
EM
algo
rith
m
EM
algo
rith
m
EM
algo
rith
m
HE
M a
lgor
ithm
Audio and sequence of audio feature vectors song DTMs tag DTM
Estimating p(x|t) efficiently & robustly
Standard EM
Standard EM
Standard EM
Tag DTMp(x|t)
EM
algo
rith
m
EM
algo
rith
m
EM
algo
rith
m
HE
M a
lgor
ithm
Audio and sequence of audio feature vectors song DTMs tag DTM
• Estimate p(x|t) from song models using hierarchical EM
Estimating p(x|t) efficiently & robustly
Standard EM
Standard EM
Standard EM
Song DTMs
Tag DTMp(x|t)
EM
algo
rith
m
EM
algo
rith
m
EM
algo
rith
m
HE
M a
lgor
ithm
Audio and sequence of audio feature vectors song DTMs tag DTM
• Estimate p(x|t) from song models using hierarchical EM
Estimating p(x|t) efficiently & robustly
Standard EM
Standard EM
Standard EM
Song DTMs
Tag DTMp(x|t)
Training Data
songsrock
happy date , ... songs, ,rock
Training Data: CAL500
CAL500
Herd It
8,000+ registeredplayers
145,000+ roundsplayed
Herd It
happy date , ... songs, ,rock
MusicAuto
Tagger
MusicAuto
Tagger
genre
mood
instrument
usage
mellow rock Search
music search engine
Evaluating Herd It Data
MusicAuto
Tagger
genre
mood
instrument
usage
MusicAuto
Tagger
genre
mood
instrument
usage
Evaluating Herd It Data
96%
Active Learning
MusicAuto
Tagger
MusicAuto
Tagger
genre
mood
instrument
usage
mellow rock Search
music search engine
Evaluating Herd It Data
Herd It
happy date , ... songs, ,
MusicAuto
Tagger
MusicAuto
Tagger
genre
mood
instrument
usage
mellow rock Search
music search enginerock
Female
73
Paris
Female
24
San Diego
mellow
rock
mellow
rock
personalized
Herd It: Including Demographics
personalized
The right music to the right
people at the right time/place
personalized
p( | )rock
p( | )rock
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
happy rock for the gym
Search
Semantic Retrieval
Query by example
‣ Rank database by similarity to a query song
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
Search
00000001
seed song
Query by example
18939234
00000004
00000003
00000002
00000001
...
Search
00000001
seed song
Query by example
‣ Learn a distance metric between audio clips
‣ Based on audio content
‣ Optimized to induce good rankings
Direct Audio-based Ranking
Audio Features
Direct Audio-based Ranking
Optimize rankings
induced by distance
Direct Audio-based Ranking
Optimize rankings
induced by distance
Direct Audio-based Ranking
Optimize rankings
induced by distance
Optimize rankings
induced by distance
Direct Audio-based Ranking
Optimize rankings
induced by distance
Direct Audio-based Ranking
Metric Learning to Rank
Optimize rankings
induced by distance
Supervision: collaborative filter (CF)
Benefits of CF similarity
Background: Structural SVM
Background: Structural SVM
‣ For certain choice of :
efficiently optimized by sorting in
descending order of some
Background: Structural SVM
Background: Structural SVM
‣ MLR learns a distance
‣ Given a query, rankings are produced through sorting
in descending order by
Metric learning to rank
‣ MLR learns a distance
‣ Given a query, rankings are produced through sorting
in descending order by
‣ Score function:
Metric learning to rank
MLR optimization
MLR optimization
MLR optimization
Comparison
Comparison
Markovoni radio - 1 million songs
Non-linear embedding
‣ Learning W ~ learning projection: g(x) = Nx , W = NTN
‣ Non-linear embedding
‣ Map X into RKHS H: φ: X → H
‣ Kernel Kij = <φ(i),φ(j)>
‣ g(i) = Nφ(i)
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
Improving Music Recommendation
rock
jazz
rom
antic
hap
pysc
ary
pia
no
guitar
gym
dat
e18939234 ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 00000004
00000003
00000002
00000001
... ...
Improving Music Recommendation
‣ In some cases, various information other than audio
content is available: could improve recommendations
‣ Explore other audio features: SAI, chroma, etc.
‣ Define embedding that integrates this variety of information
Improving Music Recommendation
Multiple kernel embedding
i
feature space 1
feature space 2
feature space ...
• Every modality: represented by it own feature space
‣ may capture complex non-linear structure
φ(1)
φ(2)
φ(m)
• Characterized by kernel: Kij(p) = < φ(p)(i),φ(p)(j)>
K(1)
K(2)
K(m)
Multiple kernel embedding
i
feature space 1
feature space 2
feature space ...
φ(1)
φ(2)
φ(m)
• Integrate by concatenation:
g(i)= φ(1)(i)
φ(m)(i)[ [......
...
g(i)= w(1)φ(1)(i)
w(m)φ(m)(i)[ [...
Multiple kernel embedding
i
feature space 1
feature space 2
feature space ...
φ(1)
φ(2)
φ(m)
...
...
• Concatenated and jointly optimized:
• Learn weights for each feature space
...w(1)
w(2)
w(m)
g(i)= N(1) φ(1)(i)
N(m) φ(m)(i)[ [...
i
• Concatenated and jointly optimized (W(p) = N(p)TN(p) 0)
...
feature space 1
feature space 2
feature space ...
...N(1)
N(2)
N(m)
• Learn projections for each feature space
φ(1)
φ(2)
φ(m)
Multiple kernel embedding
Multiple kernel embedding
Multiple kernel embedding
• Music recommendation
‣ Partial Order Embedding (JMLR 2011)
• Object Recognition
‣ Large Margin Nearest Neighbor (CVPR 2010)
‣ Metric Learning to Rank (CVPR 2011)
Zero-click Recommendation
working out
Search
Search
Teen Spirit
Zero-click Recommendation
User interaction
working out
Search
Search
Teen Spirit
required
Zero-click Recommendation
User interaction
working out
Search
Search
Teen Spirit
required
Zero-click Recommendation
Accelerometer
GPS
Proximity sensor
Microphone
Light sensor
Gyroscopic sensor
Digital compass
Heart rate sensor
Automated
happy
Accelerometer
GPS
Proximity sensor
Microphone
Light sensor
Gyroscopic sensor
Digital compass
Heart rate sensor
Automated
activity / mood
detection
Zero-click Recommendation
working out
Accelerometer
GPS
Proximity sensor
Microphone
Light sensor
Gyroscopic sensor
Digital compass
Heart rate sensor
Automated
activity / mood
detection
happy work out music
Search
Zero-click Recommendation
Zero-click Recommendation
Zero-click Recommendation
Zero-click Recommendation
happy work out music
Search
Automated
activity / mood
detection
playlist
generation&
Zero-click Recommendation
energetic
Zero-click Recommendation
energetic
mellow
Zero-click Recommendation
Music Search and Recommendation
The right music to the right
people at the right time/place
Ph.D. students: Luke Barrington, Emanuele Coviello, Kat Ellis,
Brian McFee, Bharath Sriperumbudur, Doug Turnbull, Antoni Chan,
Carolina Galleguillos
M.S. student: David Torres
Undergraduate students: Andrew Huynh, Justin Nguyen, Reid Oda,
Aaron Presley, Wanda Seto, David Vanoni, Peida Zhao
Music Search and Recommendation
Gert Lanckriet
University of California, San Diego
Music Search and Recommendation