Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides]

ECIR 2016, PADUA, ITALYEFFICIENT PSEUDO-RELEVANCE FEEDBACKMETHODS FOR COLLABORATIVE FILTERINGRECOMMENDATION

Daniel Valcarce, Javier Parapar, Álvaro Barreiro@dvalcarce @jparapar @AlvaroBarreiroG

Information Retrieval Lab@IRLab_UDC

University of A CoruñaSpain

http://ecir2016.dei.unipd.it

http://link.springer.com/chapter/10.1007/978-3-319-30671-1_44



http://www.dc.fi.udc.es/~dvalcarce

http://www.dc.fi.udc.es/~parapar

http://www.dc.fi.udc.es/~barreiro

https://twitter.com/dvalcarce

https://twitter.com/jparapar

https://twitter.com/AlvaroBarreiroG

http://irlab.org

https://twitter.com/IRLab_UDC

http://www.udc.gal

http://irlab.org

http://www.udc.gal

Outline

1. Pseudo-Relevance Feedback (PRF)

2. Collaborative Filtering (CF)

3. PRF Methods for CF

4. Experiments

5. Conclusions and Future Work

1/28

PSEUDO-RELEVANCE FEEDBACK (PRF)

Pseudo-Relevance Feedback (I)

Pseudo-Relevance Feedback provides an automatic method forquery expansion:

# Assumes that the top retrieved documents with theoriginal query are relevant (pseudo-relevant set).

# The query is expanded with the most representative termsfrom this set.

# The expanded query is expected to yield better results thanthe original one.

3/28

Pseudo-Relevance Feedback (II)

Information need

4/28


Information need

query

4/28


Information need

query RetrievalSystem

4/28


Information need


4/28


Information need


4/28


Information need


4/28


Information need


QueryExpansion

expandedquery

4/28


Information need


QueryExpansion

expandedquery

4/28

Pseudo-Relevance Feedback (III)

Some popular PRF approaches:

# Based on Rocchio’s model(Rocchio, 1971 & Carpineto et al., ACM TOIS 2001)

# Relevance-Based Language Models(Lavrenko & Croft, SIGIR 2001)

# Divergence Minimization Model(Zhai & Lafferty, SIGIR 2006)

# Mixture Models(Tao & Zhai, SIGIR 2006)

5/28

COLLABORATIVE FILTERING (CF)

Recommender Systems

Notation:

# The set of users U

# The set of items I

# The rating that the user u gave to the item i is ru ,i

# The set of items rated by user u is denoted by Iu

# The set of users that rated item i is denoted by Ui

# The neighbourhood of user u is denoted by Vu

Top-N recommendation: create a ranked list containingrelevant and unknown items for each user u ∈ U.

7/28

Collaborative Filtering (I)

Collaborative Filtering (CF) employs the past interactionbetween users and items to generate recommendations.

Idea: If this user who is similar to you likes this item, maybe you willalso like it.

Different input data:

# Explicit feedback: ratings, reviews...

# Implicit feedback: clicks, purchases...

Perhaps the most popular approach to recommendation giventhe increasing amount of information about users.

8/28

Collaborative Filtering (II)

Collaborative Filtering (CF) techniques can be classified in:

# Model-based methods: learn a predictive model from theuser-item ratings.◦ Matrix factorisation (e.g., SVD)

# Neighbourhood-based (or memory-based) methods:compute recommendations using directly part of theratings.◦ k-NN approaches

9/28

PRF METHODS FOR CF

PRF for CF

PRF CFUser’s query User’s profile

mostˆ1,populatedˆ2,stateˆ2 Titanicˆ2,Avatarˆ3,Matrixˆ5

Docum

ents

Neigh

bours

Term

s

Items

11/28

Previous Work on Adapting PRF Methods to CF

Relevance-Based Language Models

# Originally devised for PRF (Lavrenko & Croft, SIGIR 2001).# Adapted to CF (Parapar et al., Inf. Process. Manage. 2013).# Two models: RM1 and RM2.# High precision figures in recommendation.

# ... but high computational cost!

RM1 : p(i |Ru) ∝∑v∈Vu

p(v) p(i |v)∏j∈Iu

p( j |v)

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

12/28

Previous Work on Adapting PRF Methods to CF

Relevance-Based Language Models

# Originally devised for PRF (Lavrenko & Croft, SIGIR 2001).# Adapted to CF (Parapar et al., Inf. Process. Manage. 2013).# Two models: RM1 and RM2.# High precision figures in recommendation.# ... but high computational cost!

RM1 : p(i |Ru) ∝∑v∈Vu

p(v) p(i |v)∏j∈Iu

p( j |v)

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

12/28

Our Proposals based on Rocchio’s Framework

Rocchio’s Weights

pRocchio(i |u) �∑v∈Vu

rv ,i

|Vu |

Robertson Selection Value g

pRSV (i |u) �∑v∈Vu

rv ,i

|Vu | p(i |Vu)

CHI-2 g

pCHI−2(i |u) ��p(i |Vu) − p(i |C)�2

p(i |C)

Kullback–Leibler Divergence

pKLD(i |u) � p(i |Vu) logp(i |Vu)p(i |C)

13/28


Rocchio’s Weights


rv ,i

|Vu |



rv ,i

|Vu | p(i |Vu)

CHI-2 g

pCHI−2(i |u) ��p(i |Vu) − p(i |C)�2

p(i |C)



13/28


Rocchio’s Weights


rv ,i

|Vu |



rv ,i

|Vu | p(i |Vu)

CHI-2 g

pCHI−2(i |u) ��p(i |Vu) − p(i |C)�2

p(i |C)



13/28

Probability Estimation

Maximum Likelihood Estimate under a MultinomialDistribution over the ratings:

pmle(i |Vu) �∑

v∈Vu rv ,i∑v∈Vu , j∈I rv , j

pmle(i |C) �∑

u∈U ru ,i∑u∈U, j∈I ru , j

14/28

Neighbourhood Length Normalisation (I)

Neighbourhoods are computed using clustering algorithms:

# Hard clustering: every user is in only one cluster. Clustersmay have different sizes. Example: k-means.

# Soft clustering: each user has its own neighbours. Whenwe set k to a high value, we may find different amounts ofneighbours. Example: k-NN.

Idea: consider the variability of the neighbourhood lengths:

# Big neighbourhoods is equivalent to a query with a lot ofresults: the collection model is closed to the target user.

# Small neighbourhoods implies that neighbours are highlyspecific: the collection is very different from the target user.

15/28

Neighbourhood Length Normalisation (I)

Neighbourhoods are computed using clustering algorithms:

# Hard clustering: every user is in only one cluster. Clustersmay have different sizes. Example: k-means.

# Soft clustering: each user has its own neighbours. Whenwe set k to a high value, we may find different amounts ofneighbours. Example: k-NN.

Idea: consider the variability of the neighbourhood lengths:

# Big neighbourhoods is equivalent to a query with a lot ofresults: the collection model is closed to the target user.

# Small neighbourhoods implies that neighbours are highlyspecific: the collection is very different from the target user.

15/28

Neighbourhood Length Normalisation (II)

We bias the MLE to perform neighbourhood lengthnormalisation:

pnmle(i |Vu) rank�

1|Vu |

∑v∈Vu rv ,i∑

v∈Vu , j∈I rv , j

pnmle(i |C) rank�

1|U |

∑u∈U ru ,i∑

u∈U, j∈I ru , j

16/28

EXPERIMENTS

Experimental settings

Baselines:

# UB: traditional user-based neighbourhood approach.# SVD: matrix factorisation.# UIR-Item: probabilistic approach.# RM1 and RM2: Relevance-Based Language Models.

Our algorithms:

# Rocchio’s Weights (RW)# Robertson Selection Value (RSV)# CHI-2# Kullback-Leibler Divergence (KLD)

18/28

Efficiency

0.01

0.1

1

10

ML 100k ML 1M ML 10Mreco

mm

enda

tion

tim

epe

rus

er(s

)

dataset

UIRRM1RM2

SVD++RSVUBRW

CHI-2KLD

19/28

Accuracy (nDCG@10)

Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing

UB 0.0468 0.0313 0.0108 0.0055b

SVD 0.0936a 0.0608a 0.0101 0.0015UIR-Item 0.2188ab 0.1795abd 0.0174abd 0.0673abd

RM1 0.2473abc 0.1402ab 0.0146ab 0.0444ab

RM2 0.3323abcd 0.1992abd 0.0207abcd 0.0957abcd

Rocchio’s Weights 0.2604abcd 0.1557abd 0.0194abcd 0.0892abcd

RSV 0.2604abcd 0.1557abd 0.0194abcd 0.0892abcd

KLDMLE 0.2693abcd 0.1264ab 0.0197abcd 0.1576abcde

NMLE 0.3120abcd 0.1546ab 0.0201abcd 0.1101abcde

CHI-2MLE 0.0777a 0.0709ab 0.0149ab 0.0939abcd

NMLE 0.3220abcd 0.1419ab 0.0204abcd 0.1459abcde

Table: Values of nDCG@10. Pink = best algorithm. Blue = notsignificantly different to the best (Wilcoxon two-sided p < 0.01). 20/28

Diversity (Gini@10)


UIR-Item 0.0124 0.0050 0.0137 0.0005RM2 0.0256 0.0069 0.0207 0.0019CHI-2 NMLE 0.0450 0.0106 0.0506 0.0539

Table: Values of the complement of Gini index at 10. Pink = bestalgorithm.

21/28

Novelty (MSI@10)


UIR-Item 5.2337e 8.3713e 3.7186e 17.1229eRM2 6.8273c 8.9481c 4.9618c 19.27343c

CHI-2 NMLE 8.1711ec 10.0043ec 7.5555ec 8.8563

Table: Values of Mean Self-Information at 10. Pink = best algorithm.

22/28

Trade-off Accuracy-Diversity

0.06

0.07

0.08

0.09

0.10

0.11

0.12

0.13

200 300 400 500 600 700 800 900

G–(Gini,n

DCG)

k

RM2CHI-2 NMLE

Figure: G-measure of nDCG@10 and Gini@10 on MovieLens 100kvarying the number of neighbours k using Pearson’s correlationsimilarity.

23/28

Trade-off Accuracy-Novelty

0.91.01.11.21.31.41.51.61.71.81.92.0

200 300 400 500 600 700 800 900

G–(MSI,nDCG)

k

RM2CHI-2 NMLE

Figure: G-measure of nDCG@10 and MSI@10 on MovieLens 100kvarying the number of neighbours k using Pearson’s correlationsimilarity.

24/28

CONCLUSIONS AND FUTURE WORK

Conclusions

We proposed to use fast PRF methods (Rocchio’s Weigths, RSV,KLD and CHI-2):

# They are orders of magnitude faster than the RelevanceModels (up to 200x).

# They generate quite accurate recommendations.

# Good novelty and diversity figures with a better trade-offthan RM2.

# They lack of parameters (only clustering parameters).

26/28

Future Work

Other approaches for computing neighbourhoods:

# Posterior Probability Clustering (a non-negative matrixfactorisation).

# Normalised Cut (spectral clustering).

Explore other PRF methods:

# Divergence Minimization Models.

# Mixture Models.

27/28

Future Work

Other approaches for computing neighbourhoods:

# Posterior Probability Clustering (a non-negative matrixfactorisation).

# Normalised Cut (spectral clustering).

Explore other PRF methods:

# Divergence Minimization Models.

# Mixture Models.

27/28

THANK YOU!

@DVALCARCEhttp://www.dc.fi.udc.es/~dvalcarce

https://twitter.com/dvalcarce

http://www.dc.fi.udc.es/~dvalcarce

Data & Analytics

Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides]