Multiverse Recommendation: N-dimensional Tensor Factorization for Context-aware Collaborative...

Preview:

DESCRIPTION

Slides from RecSys 2010 presentation. Context has been recognized as an important factor to con- sider in personalized Recommender Systems. However, most model-based Collaborative Filtering approaches such as Ma- trix Factorization do not provide a straightforward way of integrating context information into the model. In this work, we introduce a Collaborative Filtering method based on Tensor Factorization, a generalization of Matrix Factoriza- tion that allows for a flexible and generic integration of con- textual information by modeling the data as a User-Item- Context N-dimensional tensor instead of the traditional 2D User-Item matrix. In the proposed model, called Multiverse Recommendation, different types of context are considered as additional dimensions in the representation of the data as a tensor

Citation preview

Multiverse Recommendation: N-dimensional TensorFactorization for Context-aware Collaborative

Filtering

Alexandros Karatzoglou1 Xavier Amatriain 1 Linas Baltrunas 2

Nuria Oliver 2

1Telefonica ResearchBarcelona, Spain

2Free University of BolzanoBolzano, Italy

November 4, 2010

1

Context in Recommender Systems

Context is an important factor to consider in personalizedRecommendation

2

Context in Recommender Systems

Context is an important factor to consider in personalizedRecommendation

2

Context in Recommender Systems

Context is an important factor to consider in personalizedRecommendation

2

Context in Recommender Systems

Context is an important factor to consider in personalizedRecommendation

2

Context in Recommender Systems

Context is an important factor to consider in personalizedRecommendation

2

Current State of the Art in Context- awareRecommendation

Pre-Filtering TechniquesPost-Filtering TechniquesContextual modeling

The approach presented here fits in the Contextual Modeling category

3

Current State of the Art in Context- awareRecommendation

Pre-Filtering Techniques

Post-Filtering TechniquesContextual modeling

The approach presented here fits in the Contextual Modeling category

3

Current State of the Art in Context- awareRecommendation

Pre-Filtering TechniquesPost-Filtering Techniques

Contextual modeling

The approach presented here fits in the Contextual Modeling category

3

Current State of the Art in Context- awareRecommendation

Pre-Filtering TechniquesPost-Filtering TechniquesContextual modeling

The approach presented here fits in the Contextual Modeling category

3

Current State of the Art in Context- awareRecommendation

Pre-Filtering TechniquesPost-Filtering TechniquesContextual modeling

The approach presented here fits in the Contextual Modeling category

3

Collaborative Filtering problem setting

Typically data sizes e.g. Netlix data n = 5× 105, m = 17× 103

4

Standard Matrix Factorization

Find U ∈ Rn×d and M ∈ Rd×m so that F = UM

minimizeU,ML(F ,Y ) + λΩ(U,M)

Movies!

Users!

U!

M!

5

Multiverse Recommendation: Tensors for ContextAware Collaborative Filtering

Movies!

Users!

Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗

R[U,M,C,S] := L(F ,Y ) + Ω[U,M,C] + Ω[S]

6

Tensors for Context Aware Collaborative Filtering

Movies!

Users! U!

C!

M!

S!

Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗

R[U,M,C,S] := L(F ,Y ) + Ω[U,M,C] + Ω[S]

7

Tensors for Context Aware Collaborative Filtering

Movies!

Users! U!

C!

M!

S!

Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗

R[U,M,C,S] := L(F ,Y ) + Ω[U,M,C] + Ω[S]

7

Regularization

Ω[F ] = λM‖M‖2F + λU‖U‖2F + λC‖C‖2F

Ω[S] := λS ‖S‖2F (1)

8

Squared Error Loss Function

Many implementations of MF used a simple squared error regressionloss function

l(f , y) =12

(f − y)2

thus the loss over all users and items is:

L(F ,Y ) =n∑i

m∑j

l(fij , yij)

Note that this loss provides an estimate of the conditional mean

9

Absolute Error Loss Function

Alternatively one can use the absolute error loss function

l(f , y) = |f − y |

thus the loss over all users and items is:

L(F ,Y ) =n∑i

m∑j

l(fij , yij)

which provides an estimate of the conditional median

10

Optimization - Stochastic Gradient Descent for TF

The partial gradients with respect to U, M, C and S can then bewritten as:

∂Ui∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×M Mj∗ ×C Ck∗

∂Mj∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×C Ck∗

∂Ck∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×M Mj∗

∂S l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )Ui∗ ⊗Mj∗ ⊗ Ck∗

We then iteratively update the parameter matrices and tensors usingthe following update rules:

U t+1i∗ = U t

i∗ − η∂UL− ηλUUi∗

M t+1j∗ = M t

j∗ − η∂ML− ηλMMj∗

Ct+1k∗ = Ct

k∗ − η∂CL− ηλCCk∗

St+1 = St − η∂S l(Fijk ,Yijk )− ηλSS

where η is the learning rate.

11

Optimization - Stochastic Gradient Descent for TF

The partial gradients with respect to U, M, C and S can then bewritten as:

∂Ui∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×M Mj∗ ×C Ck∗

∂Mj∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×C Ck∗

∂Ck∗ l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )S ×U Ui∗ ×M Mj∗

∂S l(Fijk ,Yijk ) = ∂Fijk l(Fijk ,Yijk )Ui∗ ⊗Mj∗ ⊗ Ck∗

We then iteratively update the parameter matrices and tensors usingthe following update rules:

U t+1i∗ = U t

i∗ − η∂UL− ηλUUi∗

M t+1j∗ = M t

j∗ − η∂ML− ηλMMj∗

Ct+1k∗ = Ct

k∗ − η∂CL− ηλCCk∗

St+1 = St − η∂S l(Fijk ,Yijk )− ηλSS

where η is the learning rate.11

Optimization - Stochastic Gradient Descent for TF

Movies!

Users! U!

C!

M!

S!

12

Optimization - Stochastic Gradient Descent for TF

Movies!

Users! U!

C!

M!

S!

13

Optimization - Stochastic Gradient Descent for TF

Movies!

Users! U!

C!

M!

S!

14

Optimization - Stochastic Gradient Descent for TF

Movies!

Users! U!

C!

M!

S!

15

Optimization - Stochastic Gradient Descent for TF

Movies!

Users! U!

C!

M!

S!

16

Experimental evaluation

We evaluate our model on contextual rating data and computing theMean Absolute Error (MAE),using 5-fold cross validation defined asfollows:

MAE =1K

n,m,c∑ijk

Dijk |Yijk − Fijk |

17

Data

Data set Users Movies Context Dim. Ratings ScaleYahoo! 7642 11915 2 221K 1-5Adom. 84 192 5 1464 1-13Food 212 20 2 6360 1-5

Table: Data set statistics

18

Context Aware Methods

Pre-filtering based approach, (G. Adomavicius et.al), computesrecommendations using only the ratings made in the same contextas the target oneItem splitting method (L. Baltrunas, F. Ricci) which identifies itemswhich have significant differences in their rating under differentcontext situations.

19

Results: Context vs. No Context

Nocontext

TensorFactorization

1.9

2.0

2.1

2.2

2.3

2.4

2.5

MA

E

(a)

Nocontext

TensorFactorization

0.80

0.85

0.90

0.95

MA

E

(b)

Figure: Comparison of matrix (no context) and tensor (context) factorizationon the Adom and Food data.

20

Yahoo Artificial Data

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

MA

E

α=0.1

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00α=0.5

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00α=0.9

No Context Reduction Item-Split Tensor Factorization

Figure: Comparison of context-aware methods on the Yahoo! artificial data

21

Yahoo Artificial Data

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Probability of contextual influence

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95M

AE

No Context

Reduction

Item-Split

Tensor Factorization

Figure: Evolution of MAE values for different methods with increasinginfluence of the context variable on the Yahoo! data.

22

Tensor Factorization

Reduction Item-Split TensorFactorization

1.9

2.0

2.1

2.2

2.3

2.4

2.5

MA

E

Figure: Comparison of context-aware methods on the Adom data.

23

Tensor Factorization

Nocontext

Reduction TensorFactorization

0.80

0.85

0.90

0.95

MA

E

Figure: Comparison of context-aware methods on the Food data.

24

Conclusions

Tensor Factorization methods seem to be promising for CARSMany different TF methods existFuture work: extend to implicit taste dataTensor representation of context data seems promising

25

Thank You !

26

Recommended