On Top-k Recommendation using Social Networks

1

On Top-k Recommendation using Social Networks

Xiwang Yang, Harald Steck*+, Yang Guo* and Yong Liu

Polytechnic Institute of NYU *Bell Labs +Netflix Inc.

1

2

Outline Background & Motivation

Social network based top-k recommendation Related Work: AllRank, SoRec, STE, SocialMF, Trust-cf

Top-k recommender using social networks Top-k MF using Social Networks Nearest Neighbor Methods

EvaluationConclusion

2

Social Recommenders Everywhere

3

Social network based top-k recommendation

TargetTargetCustomerCustomer

4

List of Top Movies ??

Recommender

Social network based top-k recommendation is not well studied

Social Top-K Recommendation Top-k recommendation:

More realistic RS task Integrate social network information into RS

Matrix Factorization(MF)• SoRec, STE, SocialMF – optimzie RMSE• AllRank - without social network information• Our approach directly optimize social network based

top-k recommendation Nearest Neighbor(NN)

• Trust-cf (recsys’09)– Combine CF neighborhood with social neighborhood,

items rated by the combined neighborhood are considered, average rating, rank item based on predicted rating to form top-k recommendation

• Our approach employs new neighborhood construction + using voting mechanism5

AllRank-(Steck kdd’10) Use AllRank to optimize top-k recommendation user’s selection bias causes the observed feedback (e.g. ratings,

purchases, clicks) in the data to be missing not at random (MNAR)—(Recsys’09) Lower ratings missed with higher probability missing ratings tend to indicate that a user does not like the item

Prediction: Objective:

Wm > 0, training on all items BaseMF: Wm = 0, training on observed ratings only Rank items based on predicted rating to form top-k list Tailor existing social-trust enhanced MF model for top-k

recommendation 6

& 2 2 2, , ,

ˆ( ) (|| || || || )o iu i u i u i F F

all u all i

W R R P Qλ− + +∑∑,

,

1

otherwiseu i

u i

m

if R observedW

w

=

,ˆ Tu i m u iR r Q P= +

& , ,,

otherwiseo i u i u iu i

m

R if R observedR

r

=

7


Social network based top-k recommendation Related Work: AllRank, SoRec, STE, SocialMF



7

SoRec Prediction:

Objective-optimize RMSE

Modified Objective-optimize top-k hit rate


( )

2 * * 2 2 2 2, , , ,

, . ( , ) .

ˆˆ( ) ( ) (|| || || || || || )u i u i u v u v F F Fu i obs u v obs

R R S S P Q Zγ λ− + − + + +∑ ∑

*,

ˆ Tu v m u vS s Q Z= +

2& 2 ( ) *( & ) * 2 2 2, , , , , ,

v

ˆˆ( ) ( ) (|| || || || || || )o i S o iu i u i u i u v u v u v F F F

all u all i all u all

W R R W S S P Q Zλ− + − + + +∑∑ ∑ ∑,

,

1

>0 otherwiseu i

u i

m

if R observedW

w

=

*

( ) ,, ( )

1

>0 otherwiseS u v

u v Sm

if S observedW

wγ

=

& , ,,


m

R if R observedR

r

=

* *

*( & ) , ,,

otherwiseo i u v u v

u v

m

S if S observedS

s

=

Top-k list generated based on ranking of predicted ratings of all items

STE: Modified Objective-optimize top-k hit rate

SocialMF: Modified Objective-optimize top-k hit rate

, ,ˆ (1 )T Tu i m u i u v v i

v

R r Q P S Q Pα α= + + − ∑

& 2 2 2, , ,

ˆ( ) (|| || || || )o iu i u i u i F F

all u all i

W R R P Qλ− + +∑∑


& 2, , ,

* *, ,

2 2

ˆ( )

( )( )

(|| || || || )

o iu i u i u i

all u all i

Tu u v v u u v v

all u v v

F F

W R R

Q S Q Q S Q

P Q

β

λ

−

+ − − ÷

+ +

∑∑

∑ ∑ ∑

,,

1

>0 otherwiseu i

u i

m

if R observedW

w

=

& , ,,


m

R if R observedR

r

=

Nearest Neighbor Methods CF-ULF approach

Use AllRank to obtain user latent features Clustering user by PCC in latent feature space Select k1 nearest neighbor for target user u Relevant items of these nearest neighbors are voted to

target user, voting weight is PCC similarity

Top-k list is generated based on voting value

, ( , ) ,v

u

u i i Iiv N

Vote sim u v δ ∈∈

= ∑ ∑

Nearest Neighbor Methods PureTrust approach

breadth-first search (BFS) in the social network to find k2 trusted users to the target user u.

Relevant items of these trusted users are voted to target user, voting weight is proportional to 1/dv

is the set of trusted users of u is the voting weight from user v

dv is the depth of user v in the BFS tree rooted at user u.

, ( , )v

tu

u i t i Iiv N

Vote w u v δ ∈∈

= ∑ ∑t

uN

( , )tw u v

1( , )tv

w u v d=

Nearest Neighbor Methods Trust-CF-ULF approach

combination of CF-ULF approach and PureTrust Find k1 nearest neighbors from the CF-ULF neighborhood Find k2 nearest neighbors from the trust neighborhood which

are not in the k1 set (k2 = k1) Relevant items of these users are voted to target user Top-k list is generated based on voting value

Trust-CF-ULF-best approach Given total neighborhood size, dynamically tune the value of

k1 and k2 to obtain the best recall result

13


Social network based top-k recommendation Related Work: AllRank, SoRec, STE, SocialMF



13

Evaluation MetricsTop-k hit rate(Recall)

The fraction of relevant items in the test set that are in the top-k of the ranking list

RMSE

14

2, ,( , )

ˆ( )

| |test

u i u iu i R

test

R RRMSE

R∈

−=

∑

Top-k hit rate on Epinions Dataset 71K users, 104K items, 571K item reviews, 509K trust statement

Up to ~10× increment compared with training on observed rating Social network is very helpful in terms of top-k recommendation

especially for recommendation of cold start users Modified SoRec outperforms modified No Trust (AllRank)by 23.1% in

terms of overall recall and 101.8% in terms of cold user recall Recall of cold users in SoRec better than all users Item rated by a cold user averagely has received 102 ratings Item rated by all users has received averagely 93 ratings

15

RMSE on Epinions Dataset Set RMSE = 1.174, BaseMF RMSE = 1.095, for SocialMF ( = 20), RMSE = 1.157, for STE ( = 0.5), RMSE = 1.117, for SoRec ( = 50 and =0)

Consistent with RMSE results in published literature SocialMF performs best in RMSE while performs

worst in terms of top-k hit rate

16

0 10j = =0.1, 4.0, 0m mr wλ = =

βα

γ ( )SMw

Experiments on Epinions Dataset-NN

Greatly outperform existing work—trust-cf Trust-cf predicts the rating value of target user in terms of

the average rating values of the user’s neighbors–which is obviously based on the observed ratings only

Our CF neighbors derived from user latent features obtained from AllRank, which considered data MNAR, training on all items

Voting is the simplest possible way of accounting for all ratings, i.e. by counting 0 for an absent rating and counting 1 for an observed relevant rating17

Experiments on Flixster Dataset~1M Users, 49K movies, 8.2M ratings,

26.7M connectionsResults are similar

18

Impact of Dimensionality and Top-k

top-k hit rate of Flixster data is much more better than Epinions data Number of items in Epinions dataset is about two times as of

Flixster dataset while recall of Flixster is more than twice of Epinions for top-5 to top-500 recommendations

Epinions is a multi-category data(cars, movies, books,etc.) users in Flixster dataset averagely have more number of

social connections and item ratings 19

Conclusion Comprehensive study on improving the accuracy of

top-k recommendation using social networks Tailor existing social-trust enhanced MF models for top-k

recommendation by considering missing ratings

Proposed a NN based top-k recommendation method combining users’ neighborhoods in the trust network with their neighborhoods in the latent feature space and used voting instead of average rating to consider all ratings

Social recommenders considering missing feedbacks that works best for minimizing RMSE works worst for maximizing the hit rate, and vice versa First developing a good RMSE approach, and then modifying

the training for top-k is not necessarily a viable strategy for obtaining a good top-k approach

20

Thanks!

Q & A

21

Technology

On Top-k Recommendation using Social Networks