Recommender SystemIntroduction
What is good recommender system?
Outline
• What is recommender system?– Mission– History– Problems
• What is good recommender system?– Experiment Methods– Evaluation Metric
Information Overload
How to solve information overload
• Catalog– Yahoo, DMOZ
• Search Engine– Google, Bing
Mission
• Help user find item of their interest.• Help item provider deliver their item to
right user.• Help website improve user engagement.
RecommenderSystem
Search Engine vs. Recommender System
• User will try search engine if– they have specific needs– they can use keywords to describe needs
• User will try recommender system if– they do not know what they want now– they can not use keywords to describe needs
History: Before 1992
• Content Filtering – An architecture for large scale information
systems [1985] (Gifford, D.K)– MAFIA: An active mail-filter agent for an
intelligent document processing support [1990] (Lutz, E.)
– A rule-based message filtering system [1988] (Pollock, S. )
History: 1992-1998
• Tapestry by Xerox Palo Alto [1992]– First system designed by collaborative filtering
• Grouplens [1994]– First recommender system using rating data
• Movielens [1997]– First movie recommender system– Provide well-known dataset for researchers
History: 1992-1998
• Fab : content-based collaborative recommendation– First unified recommender system
• Empirical Analysis of Predictive Algorithms for Collaborative Filtering [1998] (John S. Breese)– Systematically evaluate user-based
collaborative filtering
History: 1999-2005
• Amazon proposed item-based collaborative filtering (Patent is filed in 1998 and issued in 2001) [link]
• Thomas Hofmann proposed pLSA [1999] and apply similar method on collaborative filtering [2004]
• Pandora began music genome project [2000]
History: 1999-2005
• Lastfm using Audioscrobbler to generate user taste profile on musics.
• Evaluating collaborative filtering recommender systems [2004] (Jonathan L. Herlocker)
History: 2005-2009
• Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. [2005] (Alexander Tuzhilin)
• Netflix Prize [link]– Latent Factor Model (SVD, RSVD, NSVD, SVD++)– Temporal Dynamic Collaborative Filtering– Yehuda Koren [link]’s team get prize
History: 2005-2009
• ACM Conference on Recommender System [2007] (Minneapolis, Minnesota, USA)
• Digg, Youtube try recommender system.
History: 2010-now
• Context-Aware Recommender Systems• Music Recommendation and Discovery• Recommender Systems and the Social Web• Information Heterogeneity and Fusion in
Recommender Systems• Human Decision Making in Recommender Systems• Personalization in Mobile Applications• Novelty and Diversity in Recommender Systems• User-Centric Evaluation
History: 2010-now
• Facebook launches instant personalization [2010]– Clicker– Bing– Trip Advisor– Rotten Tomatoes– Pandora– ……
Problems
• Main Problems– Top-N Recommendation– Rating Prediction
Problems
• Top-N Recommendation– Input
– Output
user item
A a
B a
B b
… …
Problems
• Top-N Recommendation– Input
– Output
user item rating
A a
B a
B b
… … …
?
What is good recommender system?
Experiment Methods
• Offline Experiment• User Survey• Online Experiment– AB Testing
Experiment Methods
• Offline Experiment
DataSet
Train Test
• Advantage:• Only rely on dataset•
• Disadvantage:• Offline metric can not reflect business goal
Experiment Methods
• User Survey– Advantage:• Can get subjective metrics• Lower risk than online testing
– Disadvantage:• Higher cost than offline experiments• Some results may not have statistical significance• Users may have different behaviors under testing
environment or real environment• It’s difficult to design double blink experiments.
Experiment Methods
• On line experiments (AB Testing)– Advantage:• Can get metrics related to business goal
– Disadvantage:• High risk/cost• Need large user set to get statistical significant result
Experiment Metrics
• User Satisfaction• Prediction Accuracy• Coverage• Diversity• Novelty• Serendipity• Trust• Robust• Real-time
Experiment Metrics
• User Satisfaction– Subjective metric– Measured by user survey or online experiments
Experiment Metrics
• Prediction Accuracy– Measured by offline experiments– Top-N Recommendation• Precision / Recall
– Rating Prediction• MAE, RMSE
Experiment Metrics
• Coverage– Measure the ability of recommender system to
recommend long-tail items.
– Entropy, Gini Index
||
|),(|
I
NuRCoverage Uu
Experiment Metrics
• Diversity– Measure the ability of recommender system to
cover users’ different interests.– Different similarity metric generate different
diversity metric.
Experiment Metrics
• Diversity (Example)
Watch History Related Items
Experiment Metrics
• Novelty– Measure the ability of recommender system to
introduce long tail items to users.– International Workshop on Novelty and
Diversity in Recommender Systems [link]– Music Recommendation and Discovery in the
Long Tail [Oscar Celma]
Experiment Metrics
• Serendipity– A recommendation result is serendipity if:• it’s not related with user’s historical interest• it’s novelty to user• user will find it’s interesting after user view it
Experiment Metrics
• Trust– If user trust recommender system, they will
interact with it.– Ways to improve trust:• Transparency• Social• Trust System (Epinion)
Experiment Metrics
• Robust– The ability of recommender system to prevent
attack.– Neil Hurley. Tutorial on Robustness of
Recommender System. ACM RecSys 2011.
Experiment Metrics
• Real-time– Generate new recommendations when user
have new behaviors immediately.
Too many metric!Which is most important?
How to do trade-off
• Business goal• Our belief• Making new algorithms by 3 steps
experiments:– Offline testing– User survey– Online testing
Thanks!