Utrecht University - Universiteit Utrecht

Recommendation Systems

Unsupervised Machine Learning

Prof. Yannis VelegrakisUtrecht University

i.velegrakis@uu.nlhttps://velgias.github.io

Disclaimer

The following set of slides originates from a number of different presentations and courses of the following people. l Yannis Velegrakis (Utrecht University)l Jeff Ullman (Stanford University)l Bill Howe (U of Washington)l Martin Fouler (Thought Works) l Ekaterini Ioannou (Tilburg University)l Themis Palpanas (U of Paris-Descartes)l Yannis Velegrakis (Utrecht University)l Copyright stays with the authors.

No distribution is allowed without prior permission by the authors.

Example: Recommender Systems

l Customer Xn Buys Metallica CDn Buys Megadeth CD

l Customer Yn Does search on Metallican Recommender system

suggests Megadeth from data collected about customer X

Recommendations

Search Recommendations

Products, web sites, blogs, news items, …

Examples:

From Scarcity to Abundance

l Shelf space is a scarce commodity for traditional retailers n Also: TV networks, movie theaters,…

l Web enables near-zero-cost dissemination of information about productsn From scarcity to abundance

l More choice necessitates better filtersn Recommendation enginesn How Into Thin Air made Touching the Void

a bestseller: http://www.wired.com/wired/archive/12.10/tail.html

Sidenote: The Long Tail

Source: Chris Anderson (2004)

Physical vs. Online

Read http://www.wired.com/wired/archive/12.10/tail.html to learn more!

Types of Recommendations

l Editorial and hand curatedn List of favoritesn Lists of “essential” items

l Simple aggregatesn Top 10, Most Popular, Recent Uploads

l Tailored to individual usersn Amazon, Netflix, …

Formal Model

lX = set of CustomerslS = set of Items

lUtility function u: X × S à RnR = set of ratingsnR is a totally ordered setne.g., 0-5 stars, real number in [0,1]

Utility Matrix

0.410.2

0.30.50.21

Avatar LOTR Matrix Pirates

Key Problems

l (1) Gathering “known” ratings for matrixn How to collect the data in the utility matrix

l (2) Extrapolate unknown ratings from the known onesn Mainly interested in high unknown ratings

uWe are not interested in knowing what you don’t like but what you like

l (3) Evaluating extrapolation methodsn How to measure success/performance of

recommendation methods

(1) Gathering Ratings

l Explicitn Ask people to rate itemsn Doesn’t work well in practice – people

can’t be bothered

l Implicitn Learn ratings from user actions

uE.g., purchase implies high ratingn What about low ratings?

(2) Extrapolating Utilities

l Key problem: Utility matrix U is sparsen Most people have not rated most itemsn Cold start:

uNew items have no ratingsuNew users have no history

l Two approaches to recommender systems:n 1) Content-basedn 2) Collaborative

Content-based Recommender Systems

Content-based Recommendations

l Main idea: Recommend items to customer x similar to previous items rated highly by x

Example:l Movie recommendations

n Recommend movies with same actor(s), director, genre, …

l Websites, blogs, newsn Recommend other sites with “similar” content

Plan of Action

Item profiles

RedCircles

Triangles

User profile

recommendbuild

Item Profiles

l For each item, create an item profile

l Profile is a set (vector) of featuresn Movies: author, title, actor, director,…n Text: Set of “important” words in document

l How to pick important features?n Usual heuristic from text mining is TF-IDF

(Term frequency * Inverse Doc Frequency)uTerm … FeatureuDocument … Item

Sidenote: TF-IDF

fij = frequency of term (feature) i in doc (item) j

ni = number of docs that mention term iN = total number of docs

TF-IDF score: wij = TFij × IDFi

Doc profile = set of words with highest TF-IDF scores, together with their scores

Note: we normalize TFto discount for “longer”

documents

User Profiles

Example 1: Boolean Utility Matrix

Example 2: Star Ratings

Making Predictions

User Profiles and Prediction

l User profile possibilities:n Weighted average of rated item profilesn Variation: weight by difference from average

rating for itemn …

l Prediction heuristic:n Given user profile x and item profile i, estimate 𝑢(𝒙, 𝒊) =

cos(𝒙, 𝒊) = 𝒙·𝒊| 𝒙 |⋅| 𝒊 |

Pros: Content-based Approach

l +: No need for data on other usersn No cold-start or sparsity problems

l +: Able to recommend to users with unique tastes

l +: Able to recommend new & unpopular itemsn No first-rater problem

l +: Able to provide explanationsn Can provide explanations of recommended items by listing

content-features that caused an item to be recommended

Cons: Content-based Approach

l –: Finding the appropriate features is hardn E.g., images, movies, music

l –: Recommendations for new usersn How to build a user profile?

l –: Overspecializationn Never recommends items outside user’s

content profilen People might have multiple interestsn Unable to exploit quality judgments of other users

Collaborative Filtering

nHarnessing the judgment of other users

Collaborative Filtering

l Consider user x

l Find set N of other users whose ratings are “similar” to x’s ratings

l Estimate x’s ratings based on ratings of users in N

Option 1: Jaccard Similarity

Option 2: Cosine Similarity

Option 3: Centered Cosine

Centered Cosine Similarity (2)

Rating Predictions

Item-Item Collaborative Filtering

l So far: User-user collaborative filteringl Another view: Item-item

n For item i, find other similar itemsn Estimate rating for item i based

on ratings for similar itemsn Can use same similarity metrics and

prediction functions as in user-user model

xiNj ij

xiNj xjijxi s

sij… similarity of items i and jrxj…rating of user u on item jN(i;x)… set items rated by x similar to i

Item-Item CF (|N|=2)

121110987654321

455311

3124452

534321423

245424

5224345

423316

usersm

- unknown rating - rating between 1 to 5

121110987654321

455? 311

3124452

534321423

245424

5224345

423316

- estimate rating of movie 1 by user 5

121110987654321

455? 311

3124452

534321423

245424

5224345

423316

Neighbor selection:Identify movies similar to movie 1, rated by user 5

sim(1,m)

Here we use Pearson correlation as similarity:1) Subtract mean rating mi from each movie i

m1 = (1+3+5+5+4)/5 = 3.6row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0]2) Compute cosine similarities between rows

121110987654321

455? 311

3124452

534321423

245424

5224345

423316

Compute similarity weights:s1,3=0.41, s1,6=0.59

sim(1,m)

121110987654321

4552.6311

3124452

534321423

245424

5224345

423316

Predict by taking weighted average:

r1.5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6

Item-Item vs. User-User

0.418.010.90.30.5

0.81Avatar LOTR Matrix Pirates

¡ In practice, it has been observed that item-itemoften works better than user-user

¡ Why? Items are simpler, users have multiple tastes

Implementing Collaborative Filtering

Collaborative Filtering: Complexity

Pros/Cons of Collaborative Filtering

l + Works for any kind of itemn No feature selection needed

l - Cold Start:n Need enough users in the system to find a match

l - Sparsity: n The user/ratings matrix is sparsen Hard to find users that have rated the same items

l - First rater: n Cannot recommend an item that has not been

previously ratedn New items, Esoteric items

l - Popularity bias: n Cannot recommend items to someone with

unique taste n Tends to recommend popular items

Hybrid Methods

l Implement two or more different recommenders and combine predictionsn Perhaps using a linear model

l Add content-based methods to collaborative filteringn Item profiles for new item problemn Demographics to deal with new user problem

Global Baseline Estimate

Combining Global Baseline Estimate with CF

Evaluation

480,000 users

17,700 movies

Matrix R

Evaluation

Test Data Set

RMSE = ./

∑(1,2)∈4 �̂�21 − 𝑟21 8

480,000 users

17,700 movies

Predicted rating

True rating of user x on item i

𝒓𝟑,𝟔

Matrix R

Training Data Set

Problems with Error Measures

l Narrow focus on accuracy sometimes misses the pointn Prediction Diversityn Prediction Contextn Order of predictions

l In practice, we care only to predict high ratings:n RMSE might penalize a method that does well

for high ratings and badly for others

Collaborative Filtering: Complexity

l Expensive step is finding k most similar customers: O(|X|) l Too expensive to do at runtime

n Could pre-compute

l Naïve pre-computation takes time O(k ·|X|)v X … set of customers

l We already know how to do this!n Near-neighbor search in high dimensions (LSH)n Clusteringn Dimensionality reduction

Tip: Add Data

l Leverage all the datan Don’t try to reduce data size in an

effort to make fancy algorithms workn Simple methods on large data do best

l Add more datan e.g., add IMDB data on genres

l More data beats better algorithmshttp://anand.typepad.com/datawocky/2008/03/more-data-usual.html

Grand Prize: 0.8563

Netflix: 0.9514

Movie average: 1.0533

User average: 1.0651

Global average: 1.1296

Performance of Various Methods

Basic Collaborative filtering: 0.94

Latent factors: 0.90

Latent factors+Biases: 0.89

Collaborative filtering++: 0.91

Latent factors+Biases+Time: 0.876

When no prize…LGetting desperate.

Try a “kitchen sink” approach!

Dimensionality Reduction

l Assumption: Data lies on or near a low d-dimensional subspace

l Axes of this subspace are effective representation of the data

l Compress / reduce dimensionality:n 106 rows; 103 columns; no updatesn Random access to any cell(s); small error: OK

The above matrix is really “2-dimensional.” All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]

Rank is “Dimensionality”

l Cloud of points 3D space:n Think of point positions

as a matrix:

l We can rewrite coordinates more efficiently!n Old basis vectors: [1 0 0] [0 1 0] [0 0 1]n New basis vectors: [1 2 1] [-2 -3 1]n Then A has new coordinates: [1 0]. B: [0 1], C: [1 1]

uNotice: We reduced the number of coordinates!

1 row per point:

l Goal of dimensionality reduction is to discover the axis of data!

Rather than representingevery point with 2 coordinateswe represent each point with

1 coordinate (corresponding tothe position of the point on

the red line).

By doing this we incur a bit oferror as the points do not

exactly lie on the line

Why Reduce Dimensions?

Why reduce dimensions?l Discover hidden correlations/topics

n Words that occur commonly together

l Remove redundant and noisy featuresn Not all words are useful

l Interpretation and visualizationl Easier storage and processing of the data

SVD - Definition

A[m x n] = U[m x r] S [ r x r] (V[n x r])T

l A: Input data matrixn m x n matrix (e.g., m documents, n terms)

l U: Left singular vectors n m x r matrix (m documents, r concepts)

l S: Singular valuesn r x r diagonal matrix (strength of each ‘concept’)

(r : rank of the matrix A)l V: Right singular vectors

n n x r matrix (n terms, r concepts)

SVD - Properties

It is always possible to decompose a real matrix A into A = U S VT , where

l U, S, V: uniquel U, V: column orthonormal

n UT U = I; VT V = I (I: identity matrix)n (Columns are orthogonal unit vectors)

l S: diagonaln Entries (singular values) are positive,

and sorted in decreasing order (σ1 ³ σ2 ³ ... ³ 0)

SVD – Example: Users-to-Movies

l A = U S VT - example: Users to Movies

=SciFi

Romnce

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

l A = U S VT - example: Users to MoviesSciFi-concept

Romance-concept

=SciFi

Romnce

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

l A = U S VT - example:

Romance-concept

U is “user-to-concept” similarity matrix

SciFi-concept

=SciFi

Romnce

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

lA = U S VT - example:

Romnce

SciFi-concept

“strength” of the SciFi-concept

=SciFi

Romnce

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

l A = U S VT - example:

SciFi-concept

V is “movie-to-concept”similarity matrix

SciFi-concept

=SciFi

Romnce

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Dimensionality Reduction with SVD

SVD – Dimensionality Reduction

l Goal: Minimize the sumof reconstruction errors:

uwhere are the “old” and are the “new” coordinates

l SVD gives ‘best’ axis to project on:n ‘best’ = minimizing the reconstruction errors

l In other words, minimum reconstruction error

first right singular vector

Movie 1 rating

SVD - Interpretation #2

More detailsl Q: How exactly is dim. reduction done?

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

More detailsl Q: How exactly is dim. reduction done?l A: Set smallest singular values to zero

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 0

0 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

» x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.020.41 0.070.55 0.090.68 0.110.15 -0.590.07 -0.730.07 -0.29

12.4 0 0 9.5

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.69

Utrecht University - Universiteit Utrecht

Documents

Functional Programming - Universiteit Utrecht - faculteit

Razones y tasas - Universiteit Utrecht

Ups and Downs - Universiteit Utrecht

Experimental setup - Universiteit Utrecht

ICT-manual - Universiteit Utrecht

3 Genome phylogenies - Universiteit Utrecht

Universiteit Utrecht Introduction Peaceable Neighbourhood Researcher Bob Horjus Utrecht University Netherlands

Bibliography - Universiteit Utrecht

Puzzling through problems - Universiteit Utrecht

Introduction - Universiteit Utrecht

MASTER YOUR FUTURE - Universiteit Utrecht · 2019. 10. 1. · MASTER YOUR FUTURE AT UTRECHT UNIVERSITY MASTER YOUR FUTURE AT UTRECHT UNIVERSITY Timetable Master's Open Day Wednesday

Faculteit Geowetenschappen - Universiteit Utrecht

Luuk Ament Leiden University - Universiteit Utrecht · Poster presentation “Trends in Theory 2007” ... Marius de Leeuw Utrecht University “Preconditioning hermitian domain wall

THE PLAYFUL CITY - Universiteit Utrecht ·

Assessment of Research Quality - Universiteit Utrecht · Assessment of Research Quality Utrecht institute of Linguistics OTS Utrecht University 3 Contents 1 General information 5

˘ˇ ˆ - Universiteit Utrecht

hanical v - Universiteit Utrecht

HOLOGRAPHY - Universiteit Utrecht

BIOMARKERS IN ATOPIC DERMATITIS - Universiteit Utrecht

SURFACE WAVES - Universiteit Utrecht