Upload
oscar-celma
View
2.986
Download
5
Embed Size (px)
DESCRIPTION
This paper presents some experiments to analyse the popularity effect in music recommendation. Popularity is measured in terms of total playcounts, and the Long Tail model is used in order to rank music artists. Furthermore, metrics derived from complex network analysis are used to detect the influence of the most popular artists in the network of similar artists. The results from the experiments reveal that, as expected by its inherent social component, the collaborative filtering approach is prone to popularity bias. This has some consequences on the discovery ratio as well as in the navigation through the Long Tail. On the other hand, in both audio content-based and human expert-based approaches artists are linked independently of their popularity. This allows one to navigate from a mainstream artist to a Long Tail artist in just two or three clicks.
Citation preview
2nd netflix workshop // ACM KDD / Las Vegas, US // August, 24th 2008
From Hits to Niches?...or how popular artists can bias
music recommendation and discovery
Òscar Celma, Pedro Cano(Music Technology Group ~ UPF)
(Barcelona Music and Audio Technologies, BMAT)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
The problem
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
music overload
• Today(August, 2007)
iTunes: 6M tracks / 3B Sales P2P: 15B tracks 53% buy music on line
• Tomorrow All music will be on line Billions of tracks Millions more arriving every week
• Finding new, relevant music is hard!
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Can recommender systems help us?
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail of popularity
• Help me find it!
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
If you like The Beatles you might like ...
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the musical Turing Test
• Which recommendation is from a human, which is from a machine?
List #2
• Chuck Berry
• Harry Nilsson
• XTC
• Marshall Crenshaw
• Super Furry Animals
• Badfinger
• The Raspberries
• The Flaming Lips
• Jason Faulkner
• Michael Penn
List #1
• Bob Dylan
• Beach Boys
• Billy Joel
• Rolling Stones
• Animals
• Aerosmith
• The Doors
• Simon & Garfunkel
• Crosby, Stills Nash & Young
• Paul Simon
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the musical Turing Test
• Which recommendation is from a human, which is from a machine?
List #2
• Chuck Berry
• Harry Nilsson
• XTC
• Marshall Crenshaw
• Super Furry Animals
• Badfinger
• The Raspberries
• The Flaming Lips
• Jason Faulkner
• Michael Penn
List #1
• Bob Dylan
• Beach Boys
• Billy Joel
• Rolling Stones
• Animals
• Aerosmith
• The Doors
• Simon & Garfunkel
• Crosby, Stills Nash & Young
• Paul Simon
Machine:
Up to 11
Human
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
• popularity bias
• low novelty
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Our proposal
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Item-to-item Complex Network Analysis
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Datasets: Artist recommendation networks CF*: Social-based, incl. item-based CF (Last.fm)
“people who listen to X also listen to Y”
CB: Content-based Audio similarity “X and Y sound similar”
EX: Human expert-based (AllMusicGuide) “X similar to (or influenced by) Y”
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Small-world networks
Network traverse in a few clicks
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Last.fm is a scale-free network power law exponent for the cumulative indegree
distribution
A few artists (hubs) control the network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Indegree – avg. neighbor indegree correlation
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Indegree – avg. neighbor indegree correlation
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Indegree – avg. neighbor indegree correlation
Kin(Robert Palmer)=430=>avg(Kin(sim(Robert Palmer)))=342
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Indegree – avg. neighbor indegree correlation
Kin(Robert Palmer)=430=>avg(Kin(sim(Robert Palmer)))=342
Kin(Ed Alton)=11=>avg(Kin(sim(Ed Alton)))=18
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
• Indegree – avg. neighbor indegree correlation
• Last.fm presents assortative mixing Artists with high indegree are connected together,
similarly for low indegree artists r = Pearson correlation
complex network analysis
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
|------------|---------|-----|-----------|
| | Last.fm | CB | Exp (AMG) |
|------------|---------|-----|-----------|
|Small World | yes | yes | yes |
| | | | |
| Scale-free | yes | No | No |
| | | | |
|Ass. mixing | yes | No | No |
|------------|---------|-----|-----------|
• Summary
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• But, still some remaining questions...
Are the hubs the most popular artists?
Can we navigate through the Long Tail, via artist similarity?
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Item popularity analysis
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail in music
• last.fm dataset ~260K artists
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail in music
• last.fm dataset ~260K artists
radiohead (40,762,895)
red hot chili peppers (37,564,100)
muse (30,548,064)
the beatles (50,422,827)
death cab for cutie (29,335,085)pink floyd (28,081,366)coldplay (27,120,352)
metallica (25,749,442)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail model [Kilkki, 2007]
• F(x) = Cumulative distribution up to x
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail model [Kilkki, 2007]
• Top-8 artists: F(8)~ 3.5% of total plays
50,422,827 the beatles40,762,895 radiohead37,564,100 red hot chili peppers30,548,064 muse29,335,085 death cab for cutie28,081,366 pink floyd27,120,352 coldplay25,749,442 metallica
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail model [Kilkki, 2007]
• Split the curve in three parts
(82 artists) (6,573 artists) (~254K artists)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Combining
complex network analysis
(item-item similarity)
with the Long Tail
(item popularity)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist indegree vs. artist popularity
• Are the network hubs the most popular items?
???
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist indegree vs. artist popularity Last.fm: correlation between Kin and playcounts
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist indegree vs. artist popularity Audio CB similarity: no correlation
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist indegree vs. artist popularity Expert: correlation between Kin and playcounts
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” navigation in the Long Tail using artist similarity
how many clicks?
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” Audio CB similarity example (VIDEO)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” Audio CB similarity example
Bruce Springsteen (14,433,411 plays)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” Audio CB similarity example
Bruce Springsteen (14,433,411 plays) The Rolling Stones (27,720,169 plays)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” Audio CB similarity example
Bruce Springsteen (14,433,411 plays) The Rolling Stones (27,720,169 plays) Mike Shupp (577 plays)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail Similar artists, given an artist in the HEAD part:
Also, it can be seen as a Markovian Stochastic process...
54,68%
(0%)
45,32%64,74%
28,80%6,46%
60,92%33,26%
5,82%
CF CB EXP
Head Mid Tail Head Mid Tail Head Mid Tail
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail Markov transition matrix
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail Markov transition matrix
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail Last.fm Markov transition matrix
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail From Head to Tail, with P(T|H) > 0.4 Number of clicks needed
CF : 5 CB : 2 EXP: 2 HEAD
TAIL#clicks?
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist popularity
|-----------------------|---------|-----|-----------|
| | Last.fm | CB | Exp (AMG) |
|-----------------------|---------|-----|-----------|
| Indegree / popularity| yes | no | yes |
| | | | |
|Similarity / popularity| yes | no | no |
|-----------------------|---------|-----|-----------|
Summary
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
conclusions|-----------------------|---------|-----|-----------|
| | Last.fm | CB | Exp (AMG) |
|-----------------------|---------|-----|-----------|
| Small World | yes | yes | yes |
| | | | |
| Scale-free | yes | no | no |
| | | | |
| Ass. mixing | yes | no | no |
|-----------------------|---------|-----|-----------|
| Indegree / popularity| yes | no | yes |
| | | | |
|Similarity / popularity| yes | no | no |
|-----------------------|---------|-----|-----------|
| POPULARITY BIAS | YES | NO | FAIRLY |
|-----------------------|---------|-----|-----------|
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
future work
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
future work
• 3D Long Tail item popularity recommendation
network (item-to-item)
user profile
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
future work
• 3D Long Tail item popularity recommendation
network (item-to-item)
user profile
relevant items
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
future work
• 3D Long Tail item popularity recommendation
network (item-to-item)
user profile
relevant items
relevant and novel?
...paper @ ACM RecSys'08
2nd netflix workshop // ACM KDD / Las Vegas, US // August, 24th 2008
From Hits to Niches?...or how popular artists can bias music recommendation
and discovery
THANKS!!!
Òscar Celma, Pedro Cano(Music Technology Group ~ UPF)
(Barcelona Music and Audio Technologies, BMAT)