Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA IRGIR Group @ UAM
Predicting Performance in
Recommender Systems
Alejandro Bellogín
Supervised by Pablo Castells and Iván Cantador Escuela Politécnica Superior
Universidad Autónoma de Madrid
@abellogin
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
2
Motivation
Is it possible to predict the accuracy of a recommendation?
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
3
Hypothesis
Data that are commonly available to a
Recommender System could contain
signals that enable an a priori estimation
of the success of the recommendation
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
4
Research Questions
1. Is it possible to define a performance prediction theory for
recommender systems in a sound, formal way?
2. Is it possible to adapt query performance techniques (from
IR) to the recommendation task?
3. What kind of evaluation should be performed? Is IR
evaluation still valid in our problem?
4. What kind of recommendation problems can these models
be applied to?
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
5
Predicting Performance in Recommender Systems
RQ1. Is it possible to define a performance prediction theory
for recommender systems in a sound, formal way?
a) Define a predictor of performance = (u, i, r, …)
b) Agree on a performance metric = (u, i, r, …)
c) Check predictive power by measuring correlation
corr([(x1), …, (xn)], [(x1), …, (xn)])
d) Evaluate final performance: dynamic vs static
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
6
Predicting Performance in Recommender Systems
RQ2. Is it possible to adapt query performance techniques
(from IR) to the recommendation task?
In IR: “Estimation of the system’s performance in response
to a specific query”
Several predictors proposed
We focus on query clarity user clarity
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
7
User clarity
It captures uncertainty in user’s data
• Distance between the user’s and the system’s probability model
• X may be: users, items, ratings, or a combination
|c la rity | lo g
x X c
p x uu p x u
p x
system’s model
user’s model
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
8
User clarity
Three user clarity formulations:
|c la rity | lo g
x X c
p x uu p x u
p x
background model
user model
Name Vocabulary User model Background
model
Rating-based Ratings
Item-based Items
Item-and-rating-based Items rated by the user
|p r u
|p i u
| ,p r i u
cp r
cp i
|m l
p r i
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
9
User clarity
Seven user clarity models implemented:
Name Formulation User model Background model
RatUser Rating-based
RatItem Rating-based
ItemSimple Item-based
ItemUser Item-based
IRUser Item-and-rating-based
IRItem Item-and-rating-based
IRUserItem Item-and-rating-based
cp r
cp r
cp i
cp i
|m l
p r i
|m l
p r i
|m l
p r i
| ,U
p r i u
| ,I
p r i u
| ,U I
p r i u
|R
p i u
|U R
p i u
| , ; |U U R
p r i u p i u
| , ; |I U R
p r i u p i u
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
10
User clarity
Predictor that captures uncertainty in user’s data
Different formulations capture different nuances
More dimensions in RS than in IR: user, items, ratings, features, …
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
4 3 5 2 1 4 3 5 2 1Ratings
RatItem p_c(x)
p(x|u1)
p(x|u2)
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
11
Predicting Performance in Recommender Systems
RQ3. What kind of evaluation should be performed? Is IR evaluation still valid in our problem?
In IR: Mean Average Precision + correlation
50 points (queries) vs 1000+ points (users)
Performance metric is not clear: error-based, precision-based?
What is performance?
It may depend on the final application
Possible bias
E.g., towards users or items with larger profiles
r ~ 0.57
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
12
Predicting Performance in Recommender Systems
RQ3. What kind of evaluation should be performed? Is IR evaluation still valid in our problem?
In IR: Mean Average Precision + correlation
50 points (queries) vs 1000+ points (users)
Performance metric is not clear: error-based, precision-based?
What is performance?
It may depend on the final application
Possible bias
E.g., towards users or items with larger profiles
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
13
Predicting Performance in Recommender Systems
RQ4. What kind of recommendation problems can these
models be applied to?
Whenever a combination of strategies is available
Example 1: dynamic neighbor weighting
Example 2: dynamic ensemble recommendation
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
14
Dynamic neighbor weighting
The user’s neighbors are weighted according to their similarity
Can we take into account the uncertainty in neighbor’s data?
User neighbor weighting [1]
• Static:
• Dynamic:
[ ]
, s im , ,
v N u
g u i C u v r a t v i
[ ]
, γ s im , ,
v N u
g u i C v u v r a t v i
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
15
Dynamic hybrid recommendation
Weight is the same for every item and user (learnt from training)
What about boosting those users predicted to perform better for
some recommender?
Hybrid recommendation [3]
• Static:
• Dynamic:
R 1 R 2, , 1 ,g u i g u i g u i
R 1 R 2, γ , 1 γ ,g u i u g u i u g u i
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
16
0,80
0,82
0,84
0,86
0,88
0,90
0,92
0,94
0,96
0,98
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
MA
E
% of ratings for training
Standard CF
Clarity-enhanced CF
b) Neighbourhood size: 500
0,80
0,81
0,82
0,83
0,84
0,85
0,86
0,87
0,88
100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500
MA
E
Neighbourhood size
Standard CF
Clarity-enhanced CF
b) 80% training
Results – Neighbor weighting
Correlation analysis [1]
• With respect to Neighbor Goodness metric: “how good a neighbor is to her vicinity”
Performance [1] (MAE = Mean Average Error, the lower the better)
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
17
0,80
0,82
0,84
0,86
0,88
0,90
0,92
0,94
0,96
0,98
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
MA
E
% of ratings for training
Standard CF
Clarity-enhanced CF
b) Neighbourhood size: 500
0,80
0,81
0,82
0,83
0,84
0,85
0,86
0,87
0,88
100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500
MA
E
Neighbourhood size
Standard CF
Clarity-enhanced CF
b) 80% training
Results – Neighbour weighting
Correlation analysis [1]
• With respect to Neighbour Goodness metric: “how good a neighbour is to her vicinity”
Performance [1] (MAE = Mean Average Error, the lower the better)
Improvement of over 5% wrt. the baseline
Plus, it does not degrade performance
Positive, although not very strong correlations
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
18
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
H1 H2 H3 H4
MAP@50
Adaptive Static
0
0,05
0,1
0,15
0,2
H1 H2 H3 H4
nDCG@50
Adaptive Static
Results – Hybrid recommendation
Correlation analysis [2]
• With respect to nDCG@50 (nDCG, normalized Discount Cumulative Gain)
Performance [3]
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
19
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
H1 H2 H3 H4
MAP@50
Adaptive Static
0
0,05
0,1
0,15
0,2
H1 H2 H3 H4
nDCG@50
Adaptive Static
Results – Hybrid recommendation
Correlation analysis [2]
• With respect to nDCG@50 (nDCG, normalized Discount Cumulative Gain)
Performance [3]
In average, most of the predictors obtain positive, strong correlations
Dynamic strategy outperforms static for
different combination of recommenders
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
20
Summary
Inferring user’s performance in a recommender system
Different adaptations of query clarity techniques
Building dynamic recommendation strategies
• Dynamic neighbor weighting: according to expected goodness of neighbor
• Dynamic hybrid recommendation: based on predicted performance
Encouraging results
• Positive predictive power (good correlations between predictors and metrics)
• Dynamic strategies obtain better (or equal) results than static
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
21
Related publications
[1] A Performance Prediction Aproach to Enhance Collaborative
Filtering Performance. A. Bellogín and P. Castells. In ECIR 2010.
[2] Predicting the Performance of Recommender Systems: An
Information Theoretic Approach. A. Bellogín, P. Castells, and I.
Cantador. In ICTIR 2011.
[3] Performance Prediction for Dynamic Ensemble Recommender
Systems. A. Bellogín, P. Castells, and I. Cantador. In press.
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
22
Future Work
Explore other input sources
• Item predictors
• Social links
• Implicit data (with time)
We need a theoretical background
• Why do some predictors work better?
Larger datasets
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
23
FW – Other input sources
Item predictors
Social links
Implicit data (with time)
Item predictors could be very useful:
• Different recommender behavior depending on item attributes
• They would allow to capture popularity, diversity, etc.
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
24
FW – Other input sources
Item predictors
Social links
Implicit data (with time)
First results using social-based predictors
• Combination of social and CF
• Graph-based measures as predictors
• “Indicators” of the user strength
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
25
FW – Theoretical background
We need a theoretical background
• Why do some predictors work better?
IRGIR Group @ UAM
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
26
Thank you!
Predicting Performance in
Recommender Systems
Alejandro Bellogín
Supervied by Pablo Castells and Iván Cantador Escuela Politécnica Superior
Universidad Autónoma de Madrid
@abellogin
Acknowledgements to the National Science Foundation
for the funding to attend the conference.
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
27
Reviewer’s comments: Confidence
Other methods to measure self-performance of RS
o Confidence
These methods capture the performance of the RS,
not user’s performance
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
28
Reviewer’s comments: Neighbor’s goodness
Neighbor goodness seems to be a little bit ad-hoc
We need a measurable definition of neighbor performance
NG(u) ~ “total MAE reduction by u” ~ “MAE without u” – “MAE with u”
Some attempts in trust research: sign and error deviation [Rafter et al. 2009]
: ( , )
1 1C E C E
| | | |
C E | , , |
UU u
v U u v U uU u U u
X X
i ra t v i
v vR R
v r v i r v i
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
29
Reviewer’s comments: Neighbor’s weighting issues
Neighbor size vs dynamic neighborhood weighting
So far, only dynamic weighting
• Same training time than static weighting
Future work: dynamic size
Apply this method for larger datasets
Current work
Apply this method for other CF methods (e.g., latent factor models, SVD)
More difficult to identify the combination
Future work
ACM Conference on Recommender Systems 2011 – Doctoral Symposium
October 23, Chicago, USA
30
Reviewer’s comments: Dynamic hybrid issues
Other methods to combine recommenders
o Stacking
o Multi-linear weighting
We focus on linear weighted hybrid recommendation
Future work: cascade, stacking