72
Recommendation Systems Unsupervised Machine Learning Prof. Yannis Velegrakis Utrecht University [email protected] https://velgias.github.io

Utrecht University - Universiteit Utrecht

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Utrecht University - Universiteit Utrecht

Recommendation Systems

Unsupervised Machine Learning

Prof. Yannis VelegrakisUtrecht University

[email protected]://velgias.github.io

Page 2: Utrecht University - Universiteit Utrecht

2

Disclaimer

The following set of slides originates from a number of different presentations and courses of the following people. l Yannis Velegrakis (Utrecht University)l Jeff Ullman (Stanford University)l Bill Howe (U of Washington)l Martin Fouler (Thought Works) l Ekaterini Ioannou (Tilburg University)l Themis Palpanas (U of Paris-Descartes)l Yannis Velegrakis (Utrecht University)l Copyright stays with the authors.

No distribution is allowed without prior permission by the authors.

Page 3: Utrecht University - Universiteit Utrecht

3

Example: Recommender Systems

l Customer Xn Buys Metallica CDn Buys Megadeth CD

l Customer Yn Does search on Metallican Recommender system

suggests Megadeth from data collected about customer X

Page 4: Utrecht University - Universiteit Utrecht

4

Recommendations

Items

Search Recommendations

Products, web sites, blogs, news items, …

4

Examples:

Page 5: Utrecht University - Universiteit Utrecht

5

From Scarcity to Abundance

l Shelf space is a scarce commodity for traditional retailers n Also: TV networks, movie theaters,…

l Web enables near-zero-cost dissemination of information about productsn From scarcity to abundance

l More choice necessitates better filtersn Recommendation enginesn How Into Thin Air made Touching the Void

a bestseller: http://www.wired.com/wired/archive/12.10/tail.html

Page 6: Utrecht University - Universiteit Utrecht

6

Sidenote: The Long Tail

Source: Chris Anderson (2004)

6

Page 7: Utrecht University - Universiteit Utrecht

7

Physical vs. Online

Read http://www.wired.com/wired/archive/12.10/tail.html to learn more!

Page 8: Utrecht University - Universiteit Utrecht

8

Types of Recommendations

l Editorial and hand curatedn List of favoritesn Lists of “essential” items

l Simple aggregatesn Top 10, Most Popular, Recent Uploads

l Tailored to individual usersn Amazon, Netflix, …

8

Page 9: Utrecht University - Universiteit Utrecht

9

Formal Model

lX = set of CustomerslS = set of Items

lUtility function u: X × S à RnR = set of ratingsnR is a totally ordered setne.g., 0-5 stars, real number in [0,1]

9

Page 10: Utrecht University - Universiteit Utrecht

10

Utility Matrix

0.410.2

0.30.50.21

Avatar LOTR Matrix Pirates

Alice

Bob

Carol

David

10

Page 11: Utrecht University - Universiteit Utrecht

11

Key Problems

l (1) Gathering “known” ratings for matrixn How to collect the data in the utility matrix

l (2) Extrapolate unknown ratings from the known onesn Mainly interested in high unknown ratings

uWe are not interested in knowing what you don’t like but what you like

l (3) Evaluating extrapolation methodsn How to measure success/performance of

recommendation methods

11

Page 12: Utrecht University - Universiteit Utrecht

12

(1) Gathering Ratings

l Explicitn Ask people to rate itemsn Doesn’t work well in practice – people

can’t be bothered

l Implicitn Learn ratings from user actions

uE.g., purchase implies high ratingn What about low ratings?

12

Page 13: Utrecht University - Universiteit Utrecht

13

(2) Extrapolating Utilities

l Key problem: Utility matrix U is sparsen Most people have not rated most itemsn Cold start:

uNew items have no ratingsuNew users have no history

l Two approaches to recommender systems:n 1) Content-basedn 2) Collaborative

Page 14: Utrecht University - Universiteit Utrecht

Content-based Recommender Systems

Page 15: Utrecht University - Universiteit Utrecht

15

Content-based Recommendations

l Main idea: Recommend items to customer x similar to previous items rated highly by x

Example:l Movie recommendations

n Recommend movies with same actor(s), director, genre, …

l Websites, blogs, newsn Recommend other sites with “similar” content

15

Page 16: Utrecht University - Universiteit Utrecht

16

Plan of Action

likes

Item profiles

RedCircles

Triangles

User profile

match

recommendbuild

16

Page 17: Utrecht University - Universiteit Utrecht

17

Item Profiles

l For each item, create an item profile

l Profile is a set (vector) of featuresn Movies: author, title, actor, director,…n Text: Set of “important” words in document

l How to pick important features?n Usual heuristic from text mining is TF-IDF

(Term frequency * Inverse Doc Frequency)uTerm … FeatureuDocument … Item

17

Page 18: Utrecht University - Universiteit Utrecht

18

Sidenote: TF-IDF

fij = frequency of term (feature) i in doc (item) j

ni = number of docs that mention term iN = total number of docs

TF-IDF score: wij = TFij × IDFi

Doc profile = set of words with highest TF-IDF scores, together with their scores

18

Note: we normalize TFto discount for “longer”

documents

Page 19: Utrecht University - Universiteit Utrecht

19

User Profiles

Page 20: Utrecht University - Universiteit Utrecht

20

Example 1: Boolean Utility Matrix

Page 21: Utrecht University - Universiteit Utrecht

21

Example 2: Star Ratings

Page 22: Utrecht University - Universiteit Utrecht

22

Making Predictions

Page 23: Utrecht University - Universiteit Utrecht

23

User Profiles and Prediction

l User profile possibilities:n Weighted average of rated item profilesn Variation: weight by difference from average

rating for itemn …

l Prediction heuristic:n Given user profile x and item profile i, estimate 𝑢(𝒙, 𝒊) =

cos(𝒙, 𝒊) = 𝒙·𝒊| 𝒙 |⋅| 𝒊 |

Page 24: Utrecht University - Universiteit Utrecht

24

Pros: Content-based Approach

l +: No need for data on other usersn No cold-start or sparsity problems

l +: Able to recommend to users with unique tastes

l +: Able to recommend new & unpopular itemsn No first-rater problem

l +: Able to provide explanationsn Can provide explanations of recommended items by listing

content-features that caused an item to be recommended

Page 25: Utrecht University - Universiteit Utrecht

25

Cons: Content-based Approach

l –: Finding the appropriate features is hardn E.g., images, movies, music

l –: Recommendations for new usersn How to build a user profile?

l –: Overspecializationn Never recommends items outside user’s

content profilen People might have multiple interestsn Unable to exploit quality judgments of other users

Page 26: Utrecht University - Universiteit Utrecht

Collaborative Filtering

nHarnessing the judgment of other users

Page 27: Utrecht University - Universiteit Utrecht

27

Collaborative Filtering

l Consider user x

l Find set N of other users whose ratings are “similar” to x’s ratings

l Estimate x’s ratings based on ratings of users in N

x

N

Page 28: Utrecht University - Universiteit Utrecht

28

Page 29: Utrecht University - Universiteit Utrecht

29

Option 1: Jaccard Similarity

Page 30: Utrecht University - Universiteit Utrecht

30

Option 2: Cosine Similarity

Page 31: Utrecht University - Universiteit Utrecht

31

Option 3: Centered Cosine

Page 32: Utrecht University - Universiteit Utrecht

32

Centered Cosine Similarity (2)

Page 33: Utrecht University - Universiteit Utrecht

33

Rating Predictions

Page 34: Utrecht University - Universiteit Utrecht

34

Item-Item Collaborative Filtering

l So far: User-user collaborative filteringl Another view: Item-item

n For item i, find other similar itemsn Estimate rating for item i based

on ratings for similar itemsn Can use same similarity metrics and

prediction functions as in user-user model

åå

Î

Î×

=);(

);(

xiNj ij

xiNj xjijxi s

rsr

sij… similarity of items i and jrxj…rating of user u on item jN(i;x)… set items rated by x similar to i

Page 35: Utrecht University - Universiteit Utrecht

35

Item-Item CF (|N|=2)

121110987654321

455311

3124452

534321423

245424

5224345

423316

usersm

ovi

es

- unknown rating - rating between 1 to 5

Page 36: Utrecht University - Universiteit Utrecht

36

Item-Item CF (|N|=2)

121110987654321

455? 311

3124452

534321423

245424

5224345

423316

users

- estimate rating of movie 1 by user 5

mo

vies

Page 37: Utrecht University - Universiteit Utrecht

37

Item-Item CF (|N|=2)

121110987654321

455? 311

3124452

534321423

245424

5224345

423316

users

Neighbor selection:Identify movies similar to movie 1, rated by user 5

mo

vies

1.00

-0.18

0.41

-0.10

-0.31

0.59

sim(1,m)

Here we use Pearson correlation as similarity:1) Subtract mean rating mi from each movie i

m1 = (1+3+5+5+4)/5 = 3.6row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0]2) Compute cosine similarities between rows

Page 38: Utrecht University - Universiteit Utrecht

38

Item-Item CF (|N|=2)

121110987654321

455? 311

3124452

534321423

245424

5224345

423316

users

Compute similarity weights:s1,3=0.41, s1,6=0.59

mo

vies

1.00

-0.18

0.41

-0.10

-0.31

0.59

sim(1,m)

Page 39: Utrecht University - Universiteit Utrecht

39

Item-Item CF (|N|=2)

121110987654321

4552.6311

3124452

534321423

245424

5224345

423316

users

Predict by taking weighted average:

r1.5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6

mo

vies

Page 40: Utrecht University - Universiteit Utrecht

40

Item-Item vs. User-User

0.418.010.90.30.5

0.81Avatar LOTR Matrix Pirates

Alice

Bob

Carol

David

¡ In practice, it has been observed that item-itemoften works better than user-user

¡ Why? Items are simpler, users have multiple tastes

Page 41: Utrecht University - Universiteit Utrecht

41

Implementing Collaborative Filtering

Page 42: Utrecht University - Universiteit Utrecht

42

Collaborative Filtering: Complexity

Page 43: Utrecht University - Universiteit Utrecht

43

Pros/Cons of Collaborative Filtering

l + Works for any kind of itemn No feature selection needed

l - Cold Start:n Need enough users in the system to find a match

l - Sparsity: n The user/ratings matrix is sparsen Hard to find users that have rated the same items

l - First rater: n Cannot recommend an item that has not been

previously ratedn New items, Esoteric items

l - Popularity bias: n Cannot recommend items to someone with

unique taste n Tends to recommend popular items

Page 44: Utrecht University - Universiteit Utrecht

44

Hybrid Methods

l Implement two or more different recommenders and combine predictionsn Perhaps using a linear model

l Add content-based methods to collaborative filteringn Item profiles for new item problemn Demographics to deal with new user problem

Page 45: Utrecht University - Universiteit Utrecht

45

Global Baseline Estimate

Page 46: Utrecht University - Universiteit Utrecht

46

Combining Global Baseline Estimate with CF

Page 47: Utrecht University - Universiteit Utrecht

47

Evaluation

1 3 4

3 5 5

4 5 5

3

3

2 2 2

5

2 1 1

3 3

1

480,000 users

17,700 movies

Matrix R

Page 48: Utrecht University - Universiteit Utrecht

48

Evaluation

1 3 4

3 5 5

4 5 5

3

3

2 ? ?

?

2 1 ?

3 ?

1

Test Data Set

RMSE = ./

∑(1,2)∈4 �̂�21 − 𝑟21 8

480,000 users

17,700 movies

Predicted rating

True rating of user x on item i

𝒓𝟑,𝟔

Matrix R

Training Data Set

Page 49: Utrecht University - Universiteit Utrecht

49

Problems with Error Measures

l Narrow focus on accuracy sometimes misses the pointn Prediction Diversityn Prediction Contextn Order of predictions

l In practice, we care only to predict high ratings:n RMSE might penalize a method that does well

for high ratings and badly for others

Page 50: Utrecht University - Universiteit Utrecht

50

Collaborative Filtering: Complexity

l Expensive step is finding k most similar customers: O(|X|) l Too expensive to do at runtime

n Could pre-compute

l Naïve pre-computation takes time O(k ·|X|)v X … set of customers

l We already know how to do this!n Near-neighbor search in high dimensions (LSH)n Clusteringn Dimensionality reduction

Page 51: Utrecht University - Universiteit Utrecht

51

Tip: Add Data

l Leverage all the datan Don’t try to reduce data size in an

effort to make fancy algorithms workn Simple methods on large data do best

l Add more datan e.g., add IMDB data on genres

l More data beats better algorithmshttp://anand.typepad.com/datawocky/2008/03/more-data-usual.html

Page 52: Utrecht University - Universiteit Utrecht

52

Grand Prize: 0.8563

Netflix: 0.9514

Movie average: 1.0533

User average: 1.0651

Global average: 1.1296

Performance of Various Methods

Basic Collaborative filtering: 0.94

Latent factors: 0.90

Latent factors+Biases: 0.89

Collaborative filtering++: 0.91

Latent factors+Biases+Time: 0.876

When no prize…LGetting desperate.

Try a “kitchen sink” approach!

Page 53: Utrecht University - Universiteit Utrecht

Dimensionality Reduction

Page 54: Utrecht University - Universiteit Utrecht

54

Dimensionality Reduction

l Assumption: Data lies on or near a low d-dimensional subspace

l Axes of this subspace are effective representation of the data

Page 55: Utrecht University - Universiteit Utrecht

55

Dimensionality Reduction

l Compress / reduce dimensionality:n 106 rows; 103 columns; no updatesn Random access to any cell(s); small error: OK

The above matrix is really “2-dimensional.” All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]

Page 56: Utrecht University - Universiteit Utrecht

56

Rank is “Dimensionality”

l Cloud of points 3D space:n Think of point positions

as a matrix:

l We can rewrite coordinates more efficiently!n Old basis vectors: [1 0 0] [0 1 0] [0 0 1]n New basis vectors: [1 2 1] [-2 -3 1]n Then A has new coordinates: [1 0]. B: [0 1], C: [1 1]

uNotice: We reduced the number of coordinates!

1 row per point:

ABC A

Page 57: Utrecht University - Universiteit Utrecht

57

Dimensionality Reduction

l Goal of dimensionality reduction is to discover the axis of data!

Rather than representingevery point with 2 coordinateswe represent each point with

1 coordinate (corresponding tothe position of the point on

the red line).

By doing this we incur a bit oferror as the points do not

exactly lie on the line

Page 58: Utrecht University - Universiteit Utrecht

58

Why Reduce Dimensions?

Why reduce dimensions?l Discover hidden correlations/topics

n Words that occur commonly together

l Remove redundant and noisy featuresn Not all words are useful

l Interpretation and visualizationl Easier storage and processing of the data

58

Page 59: Utrecht University - Universiteit Utrecht

59

SVD - Definition

A[m x n] = U[m x r] S [ r x r] (V[n x r])T

l A: Input data matrixn m x n matrix (e.g., m documents, n terms)

l U: Left singular vectors n m x r matrix (m documents, r concepts)

l S: Singular valuesn r x r diagonal matrix (strength of each ‘concept’)

(r : rank of the matrix A)l V: Right singular vectors

n n x r matrix (n terms, r concepts)

Page 60: Utrecht University - Universiteit Utrecht

60

SVD - Properties

It is always possible to decompose a real matrix A into A = U S VT , where

l U, S, V: uniquel U, V: column orthonormal

n UT U = I; VT V = I (I: identity matrix)n (Columns are orthogonal unit vectors)

l S: diagonaln Entries (singular values) are positive,

and sorted in decreasing order (σ1 ³ σ2 ³ ... ³ 0)

Page 61: Utrecht University - Universiteit Utrecht

61

SVD – Example: Users-to-Movies

l A = U S VT - example: Users to Movies

=SciFi

Romnce

x x

Mat

rix

Alie

nSer

enity

Cas

abla

nca

Am

elie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 62: Utrecht University - Universiteit Utrecht

62

SVD – Example: Users-to-Movies

l A = U S VT - example: Users to MoviesSciFi-concept

Romance-concept

=SciFi

Romnce

x x

Mat

rix

Alie

nSer

enity

Cas

abla

nca

Am

elie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 63: Utrecht University - Universiteit Utrecht

63

SVD – Example: Users-to-Movies

l A = U S VT - example:

Romance-concept

U is “user-to-concept” similarity matrix

SciFi-concept

=SciFi

Romnce

x x

Mat

rix

Alie

nSer

enity

Cas

abla

nca

Am

elie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 64: Utrecht University - Universiteit Utrecht

64

SVD – Example: Users-to-Movies

lA = U S VT - example:

SciFi

Romnce

SciFi-concept

“strength” of the SciFi-concept

=SciFi

Romnce

x x

Mat

rix

Alie

nSer

enity

Cas

abla

nca

Am

elie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 65: Utrecht University - Universiteit Utrecht

65

SVD – Example: Users-to-Movies

l A = U S VT - example:

SciFi-concept

V is “movie-to-concept”similarity matrix

SciFi-concept

=SciFi

Romnce

x x

Mat

rix

Alie

nSer

enity

Cas

abla

nca

Am

elie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 66: Utrecht University - Universiteit Utrecht

Dimensionality Reduction with SVD

Page 67: Utrecht University - Universiteit Utrecht

67

SVD – Dimensionality Reduction

l Goal: Minimize the sumof reconstruction errors:

uwhere are the “old” and are the “new” coordinates

l SVD gives ‘best’ axis to project on:n ‘best’ = minimizing the reconstruction errors

l In other words, minimum reconstruction error

v1

first right singular vector

Movie 1 rating

Mov

ie 2

ratin

g

Page 68: Utrecht University - Universiteit Utrecht

68

SVD - Interpretation #2

More detailsl Q: How exactly is dim. reduction done?

= x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 69: Utrecht University - Universiteit Utrecht

69

SVD - Interpretation #2

More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero

= x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 70: Utrecht University - Universiteit Utrecht

70

SVD - Interpretation #2

More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero

x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

»

Page 71: Utrecht University - Universiteit Utrecht

71

SVD - Interpretation #2

More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero

x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

»

Page 72: Utrecht University - Universiteit Utrecht

72

SVD - Interpretation #2

More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero

» x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.020.41 0.070.55 0.090.68 0.110.15 -0.590.07 -0.730.07 -0.29

12.4 0 0 9.5

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.69