Unifying Nearest Neighbors Collaborative Filtering

October 8, 2014

Unifying Nearest Neighbors Collabora3ve Filtering

Koen Verstrepen Bart Goethals

Data: Binary, Posi>ve-‐only

3

Task: Find best recommenda>on for

4

5

By compu>ng scores

6

Focus on compu>ng

7

Mul>ple solu>ons for compu>ng s( , ) Matrix factoriza>on Nearest Neighbors Methods …

8

User-‐Based Nearest Neighbors:

Item-‐Based Nearest Neighbors:

Two solu>ons for compu>ng s( , )

9



Two solu>ons for compu>ng s( , )

Two solu>ons for compu>ng

10



s( , )

We observed

Different scores: Because different informa>on used: Both valuable Each ignore the other type of informa>on

11

≠

Research Ques>on

Can we clearly characterize how both algorithms ignore informa>on?

Can we avoid ignoring informa>on?

12

Contribu>on 1 & 2

13

Contribu>on 1 & 2: Unifying Reformula>on

14

Instead of the tradi>onal perspec>ve:

A novel, unifying reformula>on:

Contribu>on 1 & 2: Unifying Reformula>on

15

User-‐based and Item-‐based have different w( , )

16

Contribu>on 1 & 2: A novel w( , )

KUNN Unified Nearest Neighbors

Details

17

18

Computa>on of w( , )

User-‐based (reformula>on exis>ng alg.)

Item-‐based (reformula>on exis>ng alg.) KUNN (novel alg.)

19

If , we have evidence to work with

1234 : UB -‐ IB -‐ KUNN

19

20

If , we have evidence to work with

= 1

1234 : UB -‐ IB -‐ KUNN

20

only relevant to if also only relevant to if also

21

&

= 1

1234 : UB -‐ IB -‐ KUNN

21

only relevant to if also only relevant to if also

22

&

= 1 × 1

1234 : UB -‐ IB -‐ KUNN

22

needs to be sufficiently similar to

23

= 1 × 1

1234 : UB -‐ IB -‐ KUNN

23


24

= 1 × 1 × 1

1234 : UB -‐ IB -‐ KUNN

24


25

= 1 × 1

1234 : UB -‐ IB -‐ KUNN

25


26

= 1 × 1 × 1

1234 : UB -‐ IB -‐ KUNN

26

sufficiently similar to à +1 sufficiently similar to à +1 otherwise à 0

27

= 1 × 1

1234 : UB -‐ IB -‐ KUNN

27

sufficiently similar to à +1 sufficiently similar to à +1 otherwise à 0

28

= 1 × 1 × 2

1234 : UB -‐ IB -‐ KUNN

28

‘s are less informa>ve if and like many things

= 1 × 1 × 1

1234 : UB -‐ IB -‐ KUNN

29

‘s are less informa>ve if and like many things

= 1 × 1 × 1 ×

Unifying Nearest Neighbors Collaborative Filtering


University of Antwerp

Antwerp, Belgium

{koen.verstrepen,bart.goethals}@uantwerp.be

ABSTRACTWe study collaborative filtering for applications in whichthere exists for every user a set of items about which theuser has given binary, positive-only feedback (one-class col-laborative filtering). Take for example an on-line store thatknows all past purchases of every customer. An importantclass of algorithms for one-class collaborative filtering arethe nearest neighbors algorithms, typically divided into user-based and item-based algorithms. We introduce a reformu-lation that unifies user- and item-based nearest neighborsalgorithms and use this reformulation to propose a novelalgorithm that incorporates the best of both worlds andoutperforms state-of-the-art algorithms. Additionally, wepropose a method for naturally explaining the recommen-dations made by our algorithm and show that this methodis also applicable to existing user-based nearest neighborsmethods.

Categories and Subject Descriptors: H.3.3 [Informa-tion Storage and Retrieval]: Information filtering

Keywords: One-class collaborative filtering, nearest neigh-bors, explaining recommendations, top-N recommendation,recommender systems.

1. INTRODUCTION

1p2 · 3

1p1 · 2

1p2 · 3 · 1 · 2

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or corecsysercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected]’14, October 6–10, 2014, Foster City, Silicon Valley, CA, USA.Copyright 2014 ACM 978-1-4503-2668-1/14/10 ...$15.00.http://dx.doi.org/10.1145/2645710.2645731.

u is a user

i is an item

u, v 2 U are users

i, j 2 I are items

The score for recommending item i to user u

Sum over all user-item-pairs (v, j)

Typically, the training data for collaborative filtering isrepresented by a matrix in which the rows represent theusers and the columns represent the items. A value in thismatrix can be unknown or can reflect the preference of therespective user for the respective item. Here, we consider thespecific setting of binary, positive-only preference feedback.Hence, every value in the preference matrix is 1 or 0, with1 representing a known preference and 0 representing theunknown. Pan et al. [10] call this setting one-class collab-orative filtering (OCCF). Applications that correspond tothis version of the collaborative filtering problem are likeson social networking sites, tags for photo’s, websites visitedduring a surfing session, articles bought by a customer etc.In this setting, identifying the best recommendations can

be formalized as ranking all user-item-pairs (u, i) accordingto the probability that u prefers i.One class of algorithms for solving OCCF are the nearest

neighbors algorithms. Sarwar et al. [13] proposed the wellknown user-based algorithm that uses cosine similarity andDeshpande et al. [2] proposed several widely used item-basedalgorithms.Another class of algorithms for solving OCCF are matrix

factorization algorithms. The state-of-the-art algorithms inthis class are Weighted Rating Matrix Factorization [7, 10],Bayesian Personalized Ranking Matrix Factorization [12] andCollaborative Less is More Filtering [14].This work builds upon the di↵erent item-based algorithms

by Deshpande et al. [2] and the user-based algorithm by Sar-war et al. [13] that uses cosine similarity. We introduce areformulation of these algorithms that unifies their existingformulations. From this reformulation, it becomes clear thatthe existing user- and item-based algorithms unnecessarilydiscard important parts of the available information. There-fore, we propose a novel algorithm that combines both user-

1234 : UB -‐ IB -‐ KUNN

30

31

‘s are less informa>ve if and are popular

= 1 × 1 × 1

1234 : UB -‐ IB -‐ KUNN

31

32

‘s are less informa>ve if and are popular

= 1 × 1 × 1 ×




Antwerp, Belgium





1. INTRODUCTION

1p2 · 3

1p1 · 2

1p2 · 3 · 1 · 2


u is a user

i is an item

u, v 2 U are users

i, j 2 I are items








1234 : UB -‐ IB -‐ KUNN

32

‘s are less informa>ve if and like many things ‘s are less informa>ve if and are popular

33

= 1 × 1 × 2

1234 : UB -‐ IB -‐ KUNN

33

‘s are less informa>ve if and like many things ‘s are less informa>ve if and are popular

34

= 1 × 1 × 2 ×




Antwerp, Belgium





1. INTRODUCTION

1p2 · 3

1p1 · 2

1p2 · 3 · 1 · 2


u is a user

i is an item

u, v 2 U are users

i, j 2 I are items








1234 : UB -‐ IB -‐ KUNN

34

In summary

NN algorithms reformulated as sum over user-‐item pair One such pair is ( , ) For compu>ng the contribu>on of the pair ( , ) Exis>ng user-‐based methods ignore info about Exis>ng item-‐based methods ignore info about KUNN combines info about and

35

One More Thing

36

Contribu>on 3

37

Explaining recommenda>ons

38


39


40


41

Read our paper because?

1.  Our reformula>on could inspire you

2.  You might want to benchmark with KUNN (link to source code in paper)

3.  You want to explain NN recommenda>ons (incl. user-‐based)

42

Accuracy Results Rela>ve to POP

44

experimental setups. In the user selected setup (Sec. 5.1),the users selected which items they rated. In the randomselected setup (Sec. 5.2), users were asked to rate randomlychosen items.

5.1 User Selected SetupThis experimental setup is applicable to the Movielens

and the Yahoo!Musicuser datasets. In both datasets, userschose themselves which items they rated. Following Desh-pande et al. [2] and Rendle et al. [12], one preference of everyuser is randomly chosen to be the test preference for thatuser. If a user has only one preference, no test preference ischosen. The remaining preferences are represented as a 1 inthe training matrix R. All other entries of R are zero. Wedefine the hit set H

u

of a user u as the set containing all testpreferences of that user. For this setup, this is a singleton,denoted as {h

u

}, or the empty set if no test preference ischosen. Furthermore, we define U

t

as the set of users witha test preference, i.e. U

t

= {u 2 U | |Hu

| > 0}. Table 2summarizes some characteristics of the datasets.For every user u 2 U

t

, every algorithm ranks the items{i 2 I | R

ui

= 0} based on R. We denote the rank of thetest preference h

u

in such a ranking as r(hu

).Then, every set of rankings is evaluated using three mea-

sures. Following Deshpande et al. [2] we use hit rate at 10and average reciprocal hit rate at 10. In general, 10 canbe replaced by any natural number N |I|. We followDeshpande et al. [2] and choose N=10.Hit rate at 10 is given by

HR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)|,

with top10(u) the 10 highest ranked items for user u. Hence,HR@10 gives the percentage of test users for which the testpreference is in the top 10 recommendations.

Average reciprocal hit rate at 10 is given by

ARHR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)| · 1r(h

u

).

Unlike hit rate, average reciprocal hit rate takes into accountthe rank of the test preference in the top 10 of a user.

Following Rendle et al. [12], we also use the AMAN versionof area under the curve, which is for this experimental setupgiven by

AUCAMAN =1

|Ut

|X

u2Ut

|I|� r(hu

)|I|� 1

.

AMAN stands for All Missing As Negative, meaning thata missing preference is evaluated as a dislike. Like averagereciprocal hit rate, the area under the curve takes into ac-count the rank of the test preference in the recommendationlist for a user. However, AUCAMAN decreases slower thanARHR@10 when r(h

u

) increases.We repeat all experiments five times, drawing a di↵erent

random sample of test preferences every time.

5.2 Random Selected SetupThe user selected experimental setup introduces two bi-

ases in the evaluation. Firstly, popular items get more rat-ings. Secondly, the majority of the ratings is positive. Thesetwo biases can have strong influences on the results and are

thoroughly discussed by Pradel et al. [11]. The random se-lected test setup avoids these biases.Following Pradel et al. [11], the training dataset is con-

structed in the user selected way: users chose to rate anumber of items they selected themselves. The test data-set however, is the result of a voluntary survey in whichrandom items were presented to the users and a rating wasasked. In this way, both the popularity and the positivitybias are avoided.This experimental setup is applicable to the dataset

Yahoo!Musicrandom. The training data, R, of this dataset isidentical to the full Yahoo!Musicuser dataset. Additionally,this dataset includes a test dataset in which the rated itemswere randomly selected. For a given user u, the hit set H

u

contains all preferences of this user which are present in thetest dataset. The set of dislikes M

u

contains all dislikes ofthis user which are present in the test dataset. We defineUt

= {u 2 U | |Hu

| > 0, |Mu

| > 0}, i.e. all users withboth preferences and dislikes in the test dataset. Table 2summarizes some characteristics of the dataset.For every user u 2 U

t

, every algorithm ranks the itemsin H

u

[Mu

based on the training data R. The rank of anitem i in such a ranking is denoted r(i).Then, following Pradel et al. [11], we evaluate every set of

rankings with the AMAU version of area under the curve,which is given by

AUCAMAU =1

|Ut

|X

u2Ut

AUCAMAU (u),

with

AUCAMAU (u) =X

h2Hu

|{m 2 Mu

| r(m) > r(h)}||H

u

||Mu

| .

AMAU stands for All Missing As Unknown, meaning that amissing preference does not influence the evaluation. Hence,AUCAMAU (u) measures, for a user u, the fraction of dislikesthat is, on average, ranked behind the preferences. A bigadvantage of this measure is that it only relies on knownpreferences and known dislikes. Unlike the other measures,it does not make the bold assumption that items not pre-ferred by u in the past are disliked by u.Because of the natural split in a test and training dataset,

repeating the experiment with di↵erent test preferences isnot applicable.

5.3 Parameter SelectionEvery personalization algorithm in the experimental eval-

uation has at least one parameter. For every experiment wetry to find the best set of parameters using grid search. Anexperiment is defined by (1) the outer training dataset R,(2) the outer hitset

Su2Ut

Hu

, (3) the outer dislikesS

u2Ut

Mu

,

(4) the evaluation measure, and (5) the algorithm. Apply-ing grid search, we first choose a finite number of parametersets. Secondly, from the training data R, we create five in-ner data splits, defined by Rk,

Su2Ut

Hk

u

, andS

u2Ut

Mu

for

k 2 {1, 2, 3, 4, 5}. Then, for every parameter set, we re-run the algorithm on all five inner training datasets Rk andevaluate them on the corresponding inner test datasets withthe chosen evaluation measure. Finally, the parameter setwith the best average score over the five inner data splits,is chosen to be the best parameter set for the outer training




u


u


t


t

= {u 2 U | |Hu


t


ui


u




HR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)|,



ARHR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)| · 1r(h

u

).



AUCAMAN =1

|Ut

|X

u2Ut

|I|� r(hu

)|I|� 1

.


u








u


u


= {u 2 U | |Hu

| > 0, |Mu


t


u

[Mu



AUCAMAU =1

|Ut

|X

u2Ut

AUCAMAU (u),

with

AUCAMAU (u) =X

h2Hu

|{m 2 Mu

| r(m) > r(h)}||H

u

||Mu

| .





Su2Ut

Hu


u2Ut

Mu

,


Su2Ut

Hk

u

, andS

u2Ut

Mu

for





u


u


t


t

= {u 2 U | |Hu


t


ui


u




HR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)|,



ARHR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)| · 1r(h

u

).



AUCAMAN =1

|Ut

|X

u2Ut

|I|� r(hu

)|I|� 1

.


u








u


u


= {u 2 U | |Hu

| > 0, |Mu


t


u

[Mu



AUCAMAU =1

|Ut

|X

u2Ut

AUCAMAU (u),

with

AUCAMAU (u) =X

h2Hu

|{m 2 Mu

| r(m) > r(h)}||H

u

||Mu

| .





Su2Ut

Hu


u2Ut

Mu

,


Su2Ut

Hk

u

, andS

u2Ut

Mu

for





u


u


t


t

= {u 2 U | |Hu


t


ui


u




HR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)|,



ARHR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)| · 1r(h

u

).



AUCAMAN =1

|Ut

|X

u2Ut

|I|� r(hu

)|I|� 1

.


u








u


u


= {u 2 U | |Hu

| > 0, |Mu


t


u

[Mu



AUCAMAU =1

|Ut

|X

u2Ut

AUCAMAU (u),

with

AUCAMAU (u) =X

h2Hu

|{m 2 Mu

| r(m) > r(h)}||H

u

||Mu

| .





Su2Ut

Hu


u2Ut

Mu

,


Su2Ut

Hk

u

, andS

u2Ut

Mu

for





u


u


t


t

= {u 2 U | |Hu


t


ui


u




HR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)|,



ARHR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)| · 1r(h

u

).



AUCAMAN =1

|Ut

|X

u2Ut

|I|� r(hu

)|I|� 1

.


u








u


u


= {u 2 U | |Hu

| > 0, |Mu


t


u

[Mu



AUCAMAU =1

|Ut

|X

u2Ut

AUCAMAU (u),

with

AUCAMAU (u) =X

h2Hu

|{m 2 Mu

| r(m) > r(h)}||H

u

||Mu

| .





Su2Ut

Hu


u2Ut

Mu

,


Su2Ut

Hk

u

, andS

u2Ut

Mu

for





u


u


t


t

= {u 2 U | |Hu


t


ui


u




HR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)|,



ARHR@10 =1

|Ut

|X

u2Ut

|Hu

\ top10 (u)| · 1r(h

u

).



AUCAMAN =1

|Ut

|X

u2Ut

|I|� r(hu

)|I|� 1

.


u








u


u


= {u 2 U | |Hu

| > 0, |Mu


t


u

[Mu



AUCAMAU =1

|Ut

|X

u2Ut

AUCAMAU (u),

with

AUCAMAU (u) =X

h2Hu

|{m 2 Mu

| r(m) > r(h)}||H

u

||Mu

| .





Su2Ut

Hu


u2Ut

Mu

,


Su2Ut

Hk

u

, andS

u2Ut

Mu

for


Movielens 1M Yahoo!Music

KUNN WRMF BPRMF UB+IB UB IB POP

Data & Analytics

Unifying Nearest Neighbors Collaborative Filtering