Predicting Performance in Recommender Systemsir.ii.uam.es/~alejandro/2011/recsys-dc-slides.pdf · Predicting Performance in Recommender Systems RQ1. Is it possible to define a performance

ACM Conference on Recommender Systems 2011 – Doctoral Symposium

October 23, Chicago, USA IRGIR Group @ UAM

Predicting Performance in

Recommender Systems

Alejandro Bellogín

Supervised by Pablo Castells and Iván Cantador Escuela Politécnica Superior

Universidad Autónoma de Madrid

@abellogin

[email protected]

IRGIR Group @ UAM


October 23, Chicago, USA

2

Motivation

Is it possible to predict the accuracy of a recommendation?

IRGIR Group @ UAM



3

Hypothesis

Data that are commonly available to a

Recommender System could contain

signals that enable an a priori estimation

of the success of the recommendation

IRGIR Group @ UAM



4

Research Questions

1. Is it possible to define a performance prediction theory for

recommender systems in a sound, formal way?

2. Is it possible to adapt query performance techniques (from

IR) to the recommendation task?

3. What kind of evaluation should be performed? Is IR

evaluation still valid in our problem?

4. What kind of recommendation problems can these models

be applied to?

IRGIR Group @ UAM



5

Predicting Performance in Recommender Systems

RQ1. Is it possible to define a performance prediction theory

for recommender systems in a sound, formal way?

a) Define a predictor of performance = (u, i, r, …)

b) Agree on a performance metric = (u, i, r, …)

c) Check predictive power by measuring correlation

corr([(x1), …, (xn)], [(x1), …, (xn)])

d) Evaluate final performance: dynamic vs static

IRGIR Group @ UAM



6


RQ2. Is it possible to adapt query performance techniques

(from IR) to the recommendation task?

In IR: “Estimation of the system’s performance in response

to a specific query”

Several predictors proposed

We focus on query clarity user clarity

IRGIR Group @ UAM



7

User clarity

It captures uncertainty in user’s data

• Distance between the user’s and the system’s probability model

• X may be: users, items, ratings, or a combination

|c la rity | lo g

x X c

p x uu p x u

p x

system’s model

user’s model

IRGIR Group @ UAM



8

User clarity

Three user clarity formulations:

|c la rity | lo g

x X c

p x uu p x u

p x

background model

user model

Name Vocabulary User model Background

model

Rating-based Ratings

Item-based Items

Item-and-rating-based Items rated by the user

|p r u

|p i u

| ,p r i u

cp r

cp i

|m l

p r i

IRGIR Group @ UAM



9

User clarity

Seven user clarity models implemented:

Name Formulation User model Background model

RatUser Rating-based

RatItem Rating-based

ItemSimple Item-based

ItemUser Item-based

IRUser Item-and-rating-based

IRItem Item-and-rating-based

IRUserItem Item-and-rating-based

cp r

cp r

cp i

cp i

|m l

p r i

|m l

p r i

|m l

p r i

| ,U

p r i u

| ,I

p r i u

| ,U I

p r i u

|R

p i u

|U R

p i u

| , ; |U U R

p r i u p i u

| , ; |I U R

p r i u p i u

IRGIR Group @ UAM



10

User clarity

Predictor that captures uncertainty in user’s data

Different formulations capture different nuances

More dimensions in RS than in IR: user, items, ratings, features, …

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

4 3 5 2 1 4 3 5 2 1Ratings

RatItem p_c(x)

p(x|u1)

p(x|u2)

IRGIR Group @ UAM



11


RQ3. What kind of evaluation should be performed? Is IR evaluation still valid in our problem?

In IR: Mean Average Precision + correlation

50 points (queries) vs 1000+ points (users)

Performance metric is not clear: error-based, precision-based?

What is performance?

It may depend on the final application

Possible bias

E.g., towards users or items with larger profiles

r ~ 0.57

IRGIR Group @ UAM



12


RQ3. What kind of evaluation should be performed? Is IR evaluation still valid in our problem?

In IR: Mean Average Precision + correlation

50 points (queries) vs 1000+ points (users)

Performance metric is not clear: error-based, precision-based?

What is performance?

It may depend on the final application

Possible bias

E.g., towards users or items with larger profiles

IRGIR Group @ UAM



13


RQ4. What kind of recommendation problems can these

models be applied to?

Whenever a combination of strategies is available

Example 1: dynamic neighbor weighting

Example 2: dynamic ensemble recommendation

IRGIR Group @ UAM



14

Dynamic neighbor weighting

The user’s neighbors are weighted according to their similarity

Can we take into account the uncertainty in neighbor’s data?

User neighbor weighting [1]

• Static:

• Dynamic:

[ ]

, s im , ,

v N u

g u i C u v r a t v i

[ ]

, γ s im , ,

v N u

g u i C v u v r a t v i

IRGIR Group @ UAM



15

Dynamic hybrid recommendation

Weight is the same for every item and user (learnt from training)

What about boosting those users predicted to perform better for

some recommender?

Hybrid recommendation [3]

• Static:

• Dynamic:

R 1 R 2, , 1 ,g u i g u i g u i

R 1 R 2, γ , 1 γ ,g u i u g u i u g u i

IRGIR Group @ UAM



16

0,80

0,82

0,84

0,86

0,88

0,90

0,92

0,94

0,96

0,98

10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90

MA

E

% of ratings for training

Standard CF

Clarity-enhanced CF

b) Neighbourhood size: 500

0,80

0,81

0,82

0,83

0,84

0,85

0,86

0,87

0,88

100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500

MA

E

Neighbourhood size

Standard CF

Clarity-enhanced CF

b) 80% training

Results – Neighbor weighting

Correlation analysis [1]

• With respect to Neighbor Goodness metric: “how good a neighbor is to her vicinity”

Performance [1] (MAE = Mean Average Error, the lower the better)

IRGIR Group @ UAM



17

0,80

0,82

0,84

0,86

0,88

0,90

0,92

0,94

0,96

0,98

10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90

MA

E

% of ratings for training

Standard CF

Clarity-enhanced CF

b) Neighbourhood size: 500

0,80

0,81

0,82

0,83

0,84

0,85

0,86

0,87

0,88

100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500

MA

E

Neighbourhood size

Standard CF

Clarity-enhanced CF

b) 80% training

Results – Neighbour weighting


• With respect to Neighbour Goodness metric: “how good a neighbour is to her vicinity”

Performance [1] (MAE = Mean Average Error, the lower the better)

Improvement of over 5% wrt. the baseline

Plus, it does not degrade performance

Positive, although not very strong correlations

IRGIR Group @ UAM



18

0

0,01

0,02

0,03

0,04

0,05

0,06

0,07

H1 H2 H3 H4

MAP@50

Adaptive Static

0

0,05

0,1

0,15

0,2

H1 H2 H3 H4

nDCG@50

Adaptive Static

Results – Hybrid recommendation


• With respect to nDCG@50 (nDCG, normalized Discount Cumulative Gain)

Performance [3]

IRGIR Group @ UAM



19

0

0,01

0,02

0,03

0,04

0,05

0,06

0,07

H1 H2 H3 H4

MAP@50

Adaptive Static

0

0,05

0,1

0,15

0,2

H1 H2 H3 H4

nDCG@50

Adaptive Static

Results – Hybrid recommendation


• With respect to nDCG@50 (nDCG, normalized Discount Cumulative Gain)

Performance [3]

In average, most of the predictors obtain positive, strong correlations

Dynamic strategy outperforms static for

different combination of recommenders

IRGIR Group @ UAM



20

Summary

Inferring user’s performance in a recommender system

Different adaptations of query clarity techniques

Building dynamic recommendation strategies

• Dynamic neighbor weighting: according to expected goodness of neighbor

• Dynamic hybrid recommendation: based on predicted performance

Encouraging results

• Positive predictive power (good correlations between predictors and metrics)

• Dynamic strategies obtain better (or equal) results than static

IRGIR Group @ UAM



21

Related publications

[1] A Performance Prediction Aproach to Enhance Collaborative

Filtering Performance. A. Bellogín and P. Castells. In ECIR 2010.

[2] Predicting the Performance of Recommender Systems: An

Information Theoretic Approach. A. Bellogín, P. Castells, and I.

Cantador. In ICTIR 2011.

[3] Performance Prediction for Dynamic Ensemble Recommender

Systems. A. Bellogín, P. Castells, and I. Cantador. In press.

IRGIR Group @ UAM



22

Future Work

Explore other input sources

• Item predictors

• Social links

• Implicit data (with time)

We need a theoretical background

• Why do some predictors work better?

Larger datasets

IRGIR Group @ UAM



23

FW – Other input sources

Item predictors

Social links

Implicit data (with time)

Item predictors could be very useful:

• Different recommender behavior depending on item attributes

• They would allow to capture popularity, diversity, etc.

IRGIR Group @ UAM



24

FW – Other input sources

Item predictors

Social links

Implicit data (with time)

First results using social-based predictors

• Combination of social and CF

• Graph-based measures as predictors

• “Indicators” of the user strength

IRGIR Group @ UAM



25

FW – Theoretical background

We need a theoretical background

• Why do some predictors work better?

IRGIR Group @ UAM



26

Thank you!

Predicting Performance in

Recommender Systems

Alejandro Bellogín

Supervied by Pablo Castells and Iván Cantador Escuela Politécnica Superior

Universidad Autónoma de Madrid

@abellogin

[email protected]

Acknowledgements to the National Science Foundation

for the funding to attend the conference.



27

Reviewer’s comments: Confidence

Other methods to measure self-performance of RS

o Confidence

These methods capture the performance of the RS,

not user’s performance



28

Reviewer’s comments: Neighbor’s goodness

Neighbor goodness seems to be a little bit ad-hoc

We need a measurable definition of neighbor performance

NG(u) ~ “total MAE reduction by u” ~ “MAE without u” – “MAE with u”

Some attempts in trust research: sign and error deviation [Rafter et al. 2009]

: ( , )

1 1C E C E

| | | |

C E | , , |

UU u

v U u v U uU u U u

X X

i ra t v i

v vR R

v r v i r v i



29

Reviewer’s comments: Neighbor’s weighting issues

Neighbor size vs dynamic neighborhood weighting

So far, only dynamic weighting

• Same training time than static weighting

Future work: dynamic size

Apply this method for larger datasets

Current work

Apply this method for other CF methods (e.g., latent factor models, SVD)

More difficult to identify the combination

Future work



30

Reviewer’s comments: Dynamic hybrid issues

Other methods to combine recommenders

o Stacking

o Multi-linear weighting

We focus on linear weighted hybrid recommendation

Future work: cascade, stacking

Documents

Predicting Performance in Recommender Systemsir.ii.uam.es/~alejandro/2011/recsys-dc-slides.pdf · Predicting Performance in Recommender Systems RQ1. Is it possible to define a performance