82
Building Industrialscale Realworld Recommender Systems September 11, 2012 Xavier Amatriain Personaliza8on Science and Engineering Ne?lix @xamat

Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Embed Size (px)

DESCRIPTION

There is more to recommendation algorithms than rating prediction. And, there is more to recommender systems than algorithms. In this tutorial, given at the 2012 ACM Recommender Systems Conference in Dublin, I review things such as different interaction and user feedback mechanisms, offline experimentation and AB testing, or software architectures for Recommender Systems.

Citation preview

Page 1: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Building  Industrial-­‐scale  Real-­‐world  Recommender  Systems                                                                                                                            September  11,  2012    Xavier  Amatriain  Personaliza8on  Science  and  Engineering  -­‐  Ne?lix   @xamat  

Page 2: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Outline

1. Anatomy of Netflix Personalization 2. Data & Models 3. Consumer (Data) Science 4. Architectures

Page 3: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Anatomy of Netflix Personalization

Everything is a Recommendation

Page 4: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Everything is personalized

4

Note: Recommendations are per household, not individual user R

ows

Ranking

Page 5: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Top 10

5

Personalization awareness

Diversity

Dad All Son Daughter Dad&Mom Mom All Daughter Mom All?

Page 6: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Support for Recommendations

6 Social Support

Page 7: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Watch again & Continue Watching

7

Page 8: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Gen

res

8

Page 9: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Genre rows §  Personalized genre rows focus on user interest

§  Also provide context and “evidence” §  Important for member satisfaction – moving personalized rows to top on

devices increased retention

§  How are they generated? §  Implicit: based on user’s recent plays, ratings, & other interactions §  Explicit taste preferences §  Hybrid:combine the above §  Also take into account: §  Freshness - has this been shown before? §  Diversity– avoid repeating tags and genres, limit number of TV genres, etc.

Page 10: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Genres - personalization

10

Page 11: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Genres - personalization

11

Page 12: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

12

Genres- explanations

Page 13: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Genres- explanations

13

Page 14: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

14

Genres – user involvement

Page 15: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Genres – user involvement

15

Page 16: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

§  Displayed in many different contexts §  In response to

user actions/context (search, queue add…)

§  More like… rows

Similars

Page 17: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Anatomy of a Personalization - Recap §  Everything is a recommendation: not only rating

prediction, but also ranking, row selection, similarity…

§  We strive to make it easy for the user, but…

§  We want the user to be aware and be involved in the recommendation process

§  Deal with implicit/explicit and hybrid feedback

§  Add support/explanations for recommendations

§  Consider issues such as diversity or freshness

17

Page 18: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Data &

Models

Page 19: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

19

Big Data §  Plays §  Behavior §  Geo-Information §  Time

§  Ratings §  Searches

§  Impressions

§  Device info §  Metadata §  Social

§  …

Page 20: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

20

Big Data @Netflix

§  25M+ subscribers

§  Ratings: 4M/day

§  Searches: 3M/day

§  Plays: 30M/day

§  2B hours streamed in Q4 2011

§  1B hours in June 2012

Page 21: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Models §  Logistic/linear regression §  Elastic nets §  Matrix Factorization §  Markov Chains §  Clustering §  LDA §  Association Rules §  Gradient Boosted Decision Trees §  …

21

Page 22: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Rating Prediction

22

Page 23: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

2007 Progress Prize §  KorBell team (AT&T) improved by 8.43% §  Spent ~2,000 hours §  Combined 107 prediction algorithms with linear

equation §  Gave us the source code

Page 24: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

2007 Progress Prize §  Top 2 algorithms

§  SVD - Prize RMSE: 0.8914 §  RBM - Prize RMSE: 0.8990

§  Linear blend Prize RMSE: 0.88 §  Limitations

§  Designed for 100M ratings, we have 5B ratings §  Not adaptable as users add ratings §  Performance issues

§  Currently in use as part of Netflix’ rating prediction component

Page 25: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

SVD X[n x m] = U[n x r] S [ r x r] (V[m x r])T

§  X: m x n matrix (e.g., m users, n videos)

§  U: m x r matrix (m users, r concepts)

§  S: r x r diagonal matrix (strength of each ‘concept’) (r: rank of the matrix)

§  V: r x n matrix (n videos, r concepts)

Page 26: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Simon Funk’s SVD §  One of the most

interesting findings during the Netflix Prize came out of a blog post

§  Incremental, iterative, and approximate way to compute the SVD using gradient descent

26 http://sifter.org/~simon/journal/20061211.html

Page 27: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

SVD for Rating Prediction §  Associate each user with a user-factors vector

§  Associate each item with an item-factors vector

§  Define a baseline estimate to account for user and item deviation from the average

§  Predict rating using the rule

27

pu ∈ℜ f

qv ∈ℜ f

ruv' = buv + pu

Tqv

buv = µ + bu + bv

Page 28: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

SVD++ §  Koren et. al proposed an asymmetric variation that includes

implicit feedback:

§  Where §  are three item factor vectors §  Users are not parametrized, but rather represented by:

§  R(u): items rated by user u §  N(u): items for which the user has given an implicit preference (e.g. rated

vs. not rated)

28

ruv' = buv + qv

T R(u) −12 (ruj − buj )x j +j∈R(u)∑ N(u) −

12 yjj∈N (u)∑

$

%&&

'

())

qv, xv, yv ∈ℜ f

Page 29: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

RBM

Page 30: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

First generation neural networks (~60s)

§  Perceptrons (~1960) §  Single layer of hand-coded

features §  Linear activation function §  Fundamentally limited in what

they can learn to do. non-adaptive hand-coded features

output units - class labels

input units - features

Like Hate

Page 31: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Second generation neural networks (~80s)

input features

hidden layers

outputs

Back-propagate error signal to get derivatives for learning

Compare output to correct answer to compute error signal

Non-linear activation function

Page 32: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Belief Networks (~90s) §  Directed acyclic graph

composed of stochastic variables with weighted connections.

§  Can observe some of the variables

§  Solve two problems: §  Inference: Infer the states of the

unobserved variables. §  Learning: Adjust the

interactions between variables to make the network more likely to generate the observed data.

stochas8c  hidden                cause  

visible    effect  

Page 33: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Restricted Boltzmann Machine §  Restrict the connectivity to make learning easier.

§  Only one layer of hidden units. §  Although multiple layers are possible

§  No connections between hidden units. §  Hidden units are independent given the visible

states.. §  So we can quickly get an unbiased sample from

the posterior distribution over hidden “causes” when given a data-vector

§  RBMs can be stacked to form Deep Belief Nets (DBN)

hidden

i

j

visible

Page 34: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

RBM for the Netflix Prize

34

Page 35: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

What about the final prize ensembles? §  Our offline studies showed they were too

computationally intensive to scale §  Expected improvement not worth the

engineering effort

§  Plus, focus had already shifted to other issues that had more impact than rating prediction...

35

Page 36: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Ranking Key algorithm, sorts titles in most contexts

Page 37: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Ranking §  Ranking = Scoring + Sorting + Filtering

bags of movies for presentation to a user §  Goal: Find the best possible ordering of a

set of videos for a user within a specific context in real-time

§  Objective: maximize consumption §  Aspirations: Played & “enjoyed” titles have

best score §  Akin to CTR forecast for ads/search results

§  Factors §  Accuracy §  Novelty §  Diversity §  Freshness §  Scalability §  …

Page 38: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Ranking §  Popularity is the obvious baseline §  Ratings prediction is a clear secondary data

input that allows for personalization §  We have added many other features (and tried

many more that have not proved useful) §  What about the weights?

§  Based on A/B testing §  Machine-learned

Page 39: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Example: Two features, linear model

39

Popularity

Pre

dict

ed R

atin

g

1  

2  3  

4  

5  

Linear  Model:  frank(u,v)  =  w1  p(v)  +  w2  r(u,v)  +  b  

Final  Ranking  

Page 40: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Results

40

Page 41: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Learning to rank §  Machine learning problem: goal is to construct ranking

model from training data §  Training data can have partial order or binary judgments

(relevant/not relevant). §  Resulting order of the items typically induced from a

numerical score §  Learning to rank is a key element for personalization §  You can treat the problem as a standard supervised

classification problem

41

Page 42: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Learning to Rank Approaches 1.  Pointwise

§  Ranking function minimizes loss function defined on individual relevance judgment

§  Ranking score based on regression or classification §  Ordinal regression, Logistic regression, SVM, GBDT, …

2.  Pairwise §  Loss function is defined on pair-wise preferences §  Goal: minimize number of inversions in ranking §  Ranking problem is then transformed into the binary classification

problem §  RankSVM, RankBoost, RankNet, FRank…

Page 43: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Learning to rank - metrics §  Quality of ranking measured using metrics as

§  Normalized Discounted Cumulative Gain where and IDCG = ideal ranking

§  Mean Reciprocal Rank (MRR) where hi are the positive “hits” from the user

§  Mean average Precision (MAP)

where N can be number of users, items… and

43

NDCG =DCGIDCG DCG = relevance1 +

relevanceilog2 i2

n

MRR = 1H

1rank(hi )h∈H

MAP =AveP(n)

n=1

N

∑N

P = tptp+ fp

Page 44: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Learning to rank - metrics §  Quality of ranking measured using metrics as

§  Fraction of Concordant Pairs (FCP) §  Given items xi and xj, user preference P and a ranking method R, a

concordant pair (CP) is

§  Then §  Others…

§  But, it is hard to optimize machine-learned models directly on these measures §  They are not differentiable

§  Recent research on models that directly optimize ranking measures

44

xi, x j{ }s.t.P(xi )> P(x j )⇔ R(xi )< R(x j )

FCP =CP(xi, x j )

i≠ j∑n(n−1)

2

Page 45: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Learning to Rank Approaches 3.  Listwise

a.  Directly optimizing IR measures (difficult since they are not differentiable) §  Directly optimize IR measures through Genetic Programming §  Directly optimize measures with Simulated Annealing §  Gradient descent on smoothed version of objective function §  SVM-MAP relaxes the MAP metric by adding it to the SVM constraints §  AdaRank uses boosting to optimize NDCG

b.  Indirect Loss Function §  RankCosine uses similarity between the ranking list and the ground truth as

loss function §  ListNet uses KL-divergence as loss function by defining a probability

distribution §  Problem: optimization in the listwise loss function does not necessarily optimize

IR metrics

Page 46: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

46

Similars

§  Different similarities computed from different sources: metadata, ratings, viewing data…

§  Similarities can be treated as data/features

§  Machine Learned models improve our concept of “similarity”

Page 47: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Data & Models - Recap §  All sorts of feedback from the user can help generate better

recommendations §  Need to design systems that capture and take advantage of

all this data §  The right model is as important as the right data §  It is important to come up with new theoretical models, but

also need to think about application to a domain, and practical issues

§  Rating prediction models are only part of the solution to recommendation (think about ranking, similarity…)

47

Page 48: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Consumer (Data) Science

Page 49: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Consumer Science

§  Main goal is to effectively innovate for customers §  Innovation goals

§  “If you want to increase your success rate, double your failure rate.” – Thomas Watson, Sr., founder of IBM

§  The only real failure is the failure to innovate §  Fail cheaply §  Know why you failed/succeeded

49

Page 50: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Consumer (Data) Science 1.  Start with a hypothesis:

§  Algorithm/feature/design X will increase member engagement with our service, and ultimately member retention

2.  Design a test §  Develop a solution or prototype §  Think about dependent & independent variables, control,

significance…

3.  Execute the test 4.  Let data speak for itself

50

Page 51: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Offline/Online testing process

51

Rollout Feature to all users

Offline testing

Online A/B testing [success] [success]

[fail]

days Weeks to months

Page 52: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Offline testing process

52

Initial Hypothesis

Train Model offline

Test

Hypothesis validated offline?

Decide Model

Try different model?

Reformulate Hypothesis

Rollout Prototype

Wait for Results

Analyze Results

Significant improvement

on users?

[yes]

[yes]

[success]

[no]

[no]

[no]

Online A/B testing

[fail]

Rollout Feature to all users

Page 53: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Offline testing §  Optimize algorithms offline §  Measure model performance, using metrics such as:

§  Mean Reciprocal Rank, Normalized Discounted Cumulative Gain, Fraction of Concordant Pairs, Precision/Recall & F-measures, AUC, RMSE, Diversity…

§  Offline performance used as an indication to make informed decisions on follow-up A/B tests

§  A critical (and unsolved) issue is how offline metrics can correlate with A/B test results.

§  Extremely important to define a coherent offline evaluation framework (e.g. How to create training/testing datasets is not trivial)

53

Page 54: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Online A/B testing process

54

Train Model offline

Test

Hypothesis validated offline?

Decide Model

Try different model?

Reformulate Hypothesis

Design A/B Test

Rollout Prototype

Choose Control Group

Wait for Results

Analyze Results

Significant improvement

on users?

Rollout Feature to all users

[yes]

[success]

[yes]

[no]

[no]

[no]

Offline testing

Page 55: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Executing A/B tests §  Many different metrics, but ultimately trust user

engagement (e.g. hours of play and customer retention)

§  Think about significance and hypothesis testing §  Our tests usually have thousands of members and 2-20 cells

§  A/B Tests allow you to try radical ideas or test many approaches at the same time. §  We typically have hundreds of customer A/B tests running

§  Decisions on the product always data-driven

55

Page 56: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

What to measure §  OEC: Overall Evaluation Criteria

§  In an AB test framework, the measure of success is key

§  Short-term metrics do not always align with long term goals §  E.g. CTR: generating more clicks might mean that our

recommendations are actually worse

§  Use long term metrics such as LTV (Life time value) whenever possible §  In Netflix, we use member retention

56

Page 57: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

What to measure §  Short-term metrics can sometimes be informative, and

may allow for faster decision-taking §  At Netflix we use many such as hours streamed by users or

%hours from a given algorithm

§  But, be aware of several caveats of using early decision mechanisms

57

Initial effects appear to trend. See “Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained” [Kohavi et. Al. KDD 12]

Page 58: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Consumer Data Science - Recap §  Consumer Data Science aims to innovate for the

customer by running experiments and letting data speak

§  This is mainly done through online AB Testing

§  However, we can speed up innovation by experimenting offline

§  But, both for online and offline experimentation, it is important to choose the right metric and experimental framework

58

Page 59: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

59

Architectures

Page 60: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Technology

60 hQp://techblog.ne?lix.com  

Page 61: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

61

Page 62: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

62

Event & Data Distribution

Page 63: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

63

•  UI devices should broadcast many different kinds of user events

•  Clicks •  Presentations •  Browsing events •  …

•  Events vs. data •  Some events only need to be

propagated and trigger an action (low latency, low information per event)

•  Others need to be processed and “turned into” data (higher latency, higher information quality).

•  And… there are many in between •  Real-time event flow managed

through internal tool (Manhattan) •  Data flow mostly managed through

Hadoop.

Event & Data Distribution

Page 64: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

64

Offline Jobs

Page 65: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

65

•  Two kinds of offline jobs •  Model training •  Batch offline computation of

recommendations/intermediate results

•  Offline queries either in Hive or PIG

•  Need a publishing mechanism that solves several issues

•  Notify readers when result of query is ready

•  Support different repositories (s3, cassandra…)

•  Handle errors, monitoring… •  We do this through Hermes

Offline Jobs

Page 66: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

66

Computation

Page 67: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

67

•  Two ways of computing personalized results

•  Batch/offline •  Online

•  Each approach has pros/cons •  Offline

+  Allows more complex computations +  Can use more data -  Cannot react to quick changes -  May result in staleness

•  Online +  Can respond quickly to events +  Can use most recent data -  May fail because of SLA -  Cannot deal with “complex”

computations •  It’s not an either/or decision

•  Both approaches can be combined

Computation

Page 68: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

68

Signals & Models

Page 69: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

69

•  Both offline and online algorithms are based on three different inputs:

•  Models: previously trained from existing data

•  (Offline) Data: previously processed and stored information

•  Signals: fresh data obtained from live services

•  User-related data •  Context data (session, date,

time…)

Signals & Models

Page 70: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

70

Results

Page 71: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

71

•  Recommendations can be serviced from:

•  Previously computed lists •  Online algorithms •  A combination of both

•  The decision on where to service the recommendation from can respond to many factors including context.

•  Also, important to think about the fallbacks (what if plan A fails)

•  Previously computed lists/intermediate results can be stored in a variety of ways

•  Cache •  Cassandra •  Relational DB

Results

Page 72: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Alerts and Monitoring §  A non-trivial concern in large-scale recommender

systems

§  Monitoring: continuously observe quality of system

§  Alert: fast notification if quality of system goes below a certain pre-defined threshold

§  Questions: §  What do we need to monitor? §  How do we know something is “bad enough” to alert

72

Page 73: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

What to monitor §  Staleness

§  Monitor time since last data update

73

Did something go wrong here?

Page 74: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

What to monitor §  Algorithmic quality

§  Monitor different metrics by comparing what users do and what your algorithm predicted they would do

74

Page 75: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

What to monitor §  Algorithmic quality

§  Monitor different metrics by comparing what users do and what your algorithm predicted they would do

75

Did something go wrong here?

Page 76: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

What to monitor §  Algorithmic source for users

§  Monitor how users interact with different algorithms

76

Algorithm X

New version

Did something go wrong here?

Page 77: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

When to alert §  Alerting thresholds are hard to tune

§  Avoid unnecessary alerts (the “learn-to-ignore problem”) §  Avoid important issues being noticed before the alert happens

§  Rules of thumb §  Alert on anything that will impact user experience significantly §  Alert on issues that are actionable §  If a noticeable event happens without an alert… add a new alert

for next time

77

Page 78: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

78

Conclusions

Page 79: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

The Personalization Problem §  The Netflix Prize simplified the recommendation problem

to predicting ratings

§  But… §  User ratings are only one of the many data inputs we have §  Rating predictions are only part of our solution

§  Other algorithms such as ranking or similarity are very important

§  We can reformulate the recommendation problem §  Function to optimize: probability a user chooses something and

enjoys it enough to come back to the service

79

Page 80: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

More to Recsys than Algorithms §  Not only is there more to algorithms than rating

prediction

§  There is more to Recsys than algorithms §  User Interface & Feedback §  Data §  AB Testing §  Systems & Architectures

80

Page 81: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

More data + Better models +

More accurate metrics + Better approaches & architectures

81

Lots of room for improvement!

Page 82: Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

We’re hiring!

Xavier Amatriain (@xamat) [email protected]