Music: Tuned to you

Preview:

Citation preview

Music: Tuned to you Mohitdeep Singh Data Scientist!

Predictive Analytics Innovation Summit Feb 12-13, 2015 San Diego

!

!

! http://www.rdio.com/about/!

!

!

Big Data @Rdio!

Tracks metadata!

Signal Processing!

Millions of hrs of music streamed every month!

Clicks!User Demography!

Social Info!

Every single interaction!

!

Committed to opensource!

!

!

Scenario !

!

Scenario !

!

The answer lies in the matrix!

!

2! 7! 44!22! 17!

9! 12!21! 18!77! 44!

!

Baseline -Popularity!Recommend based on popularity of tracks!Pros:!

•  Again, a very simple model!

•  Easy to implement!

•  More efficient on Apache Giraph(by exploiting its property)!

•  Always a good baseline!

Cons:!•  Not really recommending anything!•  No element of discovery!

!

Long Tail Problem!

!

Nearest Neighbors!

2! 7! 44!22! 17!

9! 12!21! 18!77! 44!

!

Distance matrix!

1! 0! 0! 0.0873! 0! 1! 0! 0!0! 1! 0! 0! 0! 0! 0! 0.3603!

0! 0! 1! 0! 1! 0! 0! 0!0.0873! 0! 0! 1! 0! 0.0873! 0.2621! 0.8967!

0! 0! 1! 0! 1! 0! 0! 0!..! ..! ..! ..! ..! ..! ..! ..!

!

Top-N Recommendations!

*!

≈!

P = R* D!

!

Top-N Recommendations!

*!

≈!

P = R* D!

!

Pros!

•  Easy to reason models!

•  Easily scaled via Map Reduce.!

•  Gives decent performance on test set!

Cons!

•  If users and the items space are not stable, then things can and will go wrong.!

•  Lacks serendipity.!

•  No guarantee on the number of predictions/user. !

!

!

Latent Factor Models!

Approach pioneered during Netflix Prize Competition.!

Key idea is to decompose rating matrix into multiple lower rank approximations.!

!

≈! *! =!

!

≈! *! =!

!

≈! *! =!

!

Pros!

•  Tries to learn the underlying concepts!

•  User/ item supplementary information can be baked in into learning algorithm (factorization machines).!

!

Cons:!

•  Doesn’t perform as well as simple nearest models!

•  Interpretation of latent space is hard.!

!

!

Bayesian Personalized Ranking!•  Constructs a preference order for each user!

•  Directly optimizes the ranking function!

•  Takes into account the order preference.!

•  Implemented in scalable fashion on top of Apache Giraph!

!

Results!

Popularity!

Nearest Neighbors!

Matrix Factorization!

Weighted Matrix Factorization!

Bayesian PR!

100%!50%!0! 150%!

Comparison of algorithms considering popularity as baseline!

Note: Offline metrics tracking MAP!

!

Candidate Tracks !Catalogue of around 32M tracks!

!

 P(Relevant | , Artist)!!

Track Id! Artist similarity! Track popularity! Artist popularity! Track duration! ..! ..! ..! ..! Relevant!0/1!

“My December”!

1! 0.992! 0.433! 482! ..! ..! ..! ..! 1!

‘’Shake it Off”! 0.03!!

0.04!!

0.88! 329!!

0!!

“Sugar”! 0.772!!

0.95!!

0.77!!

220!!

1!!

!

!

Many open problems!

!

It’s a tough problem!!!

!

Current/Future work!

•  Build an ensemble model to incorporate other models.!

•  Simplify A/B testing framework.!

•  Integrate content based recommendations.!

•  Experimenting with some deep-learning techniques.!

•  Incorporate information from the www.!

!

Questions

Interested: Checkout https://www.rdio.com/careers/

Recommended