27
Hands-on Recommender System Experiments with MyMediaLite Zeno Gantner [email protected] Nokia gate5 GmbH, Berlin 13 September 2012 0 / 26

Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Hands-on Recommender System Experimentswith MyMediaLite

Zeno [email protected]

Nokia gate5 GmbH, Berlin

13 September 2012

0 / 26

Page 2: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

MyMediaLite Recommendation Algorithm Library

Major features:

I scalable implementations of many state-of-the-art recommendationmethods – tested on up to 700M events

I evaluation framework for reproducible research

I ready to be used: command line tools, not programming necessary

1 / 26

Page 3: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

MyMediaLite

I rating prediction

I item recommendation

I group recommendation

features

I command-line tools

I evaluation framework

I usable from C#, Python, Ruby, F#

I Java ports available

development

I written in C#, runs on Mono

I regular releases (ca. 1 every 2 months)

I simple

I free

I scalable

I well-documented

I well-tested

I choice

http://ismll.de/mymedialite

2 / 26

Page 4: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Methods in MyMediaLite

State-of-the-art recommendation methods in MyMediaLite:

I kNN variants

I Online-Updating Regularized Kernel Matrix Factorization[Rendle and Schmidt-Thieme, RecSys 2009]

I SocialMF [Jamali and Ester, RecSys 2010]

I Asymmetric Factor Models (AFM) [Paterek, KDD Cup 2007]

I SVD++ [Koren, KDD 2008]

I Weighted Regularized Matrix Factorization (WR-MF )[Hu and Koren, ICDM 2008], [Pan et al., ICDM 2008]

I BPR-MF [Rendle et al., UAI 2009]

3 / 26

Page 5: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Simplified Architecture

recommender

model

interactiondata

user/itemattributes,relations

serializedmodel

predictions

hyperparameters

4 / 26

Page 6: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

File Format: MovieLens

user ID item ID rating timestamp

196 242 3 881250949186 302 3 89171774222 377 1 878887116

244 51 2 880606923

Remarks

I user and item IDs can be (almost) arbitrary strings

I separator: whitespace, tab, comma, ::

I alternative date/time format: yyyy-mm-dd

I rating and date/time fields are optional

I import script; Unix tools, Perl, Python . . .

5 / 26

Page 7: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Explicit Feedback

6 / 26

Page 8: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Getting Help: Usage Information

rating prediction --help

7 / 26

Page 9: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Data

rating prediction --training-file=u1.base --test-file=u1.test

8 / 26

Page 10: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Recommender Options

rating prediction --training-file=u.data --test-ratio=0.2

9 / 26

Page 11: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Fixing the Random Seed

rating prediction . . . --random-seed=1

10 / 26

Page 12: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Choosing a Recommender

rating prediction . . . --recommender=UserAverage

11 / 26

Page 13: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Choosing a Recommender

rating prediction . . . --recommender=UserItemBaseline

12 / 26

Page 14: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Iterative Recommenders

rating prediction. . . --recommender=BiasedMatrixFactorization

--find-iter=1 --max-iter=30

13 / 26

Page 15: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Recommender Options (Hyperparameters)

rating prediction. . . --recommender-options=”num factors=5”

14 / 26

Page 16: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Recommender Options (Hyperparameters)

rating prediction. . . --recommender-options=”num factors=5 reg=0.05”

15 / 26

Page 17: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

SVD++

rating prediction . . . --recommender=SVDPlusPlus--recommender-options=”num factors=5 reg=0.1

learn rate=0.01”

16 / 26

Page 18: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Personalized Item Recommendation

17 / 26

Page 19: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Implicit Feedback

Behavior that is not an immediate expression of preference

I views

I clicks

I purchases

Advantages over explicit feedback:

I easy to collect

I available in abundance

positive-only feedback

18 / 26

Page 20: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Item Recommendation Tool: Very Similar Usage

item recommendation --training-file=u.data--test-ratio=0.2

19 / 26

Page 21: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Item Recommendation Tool

item recommendation . . . --recommender=UserKNN

20 / 26

Page 22: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Choosing a Different Correlation/Similarity

item recommendation. . . --recommender-options=”correlation=Jaccard”

21 / 26

Page 23: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Option Shortcuts

item recommendation. . . --recommender-options=”cor=Cosine w=true q=1.5”

22 / 26

Page 24: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Iterative Recommenders / Save Predictions to Disk

item recommendation . . . --recommender=WRMF--find-iter=1 --max-iter=10 --prediction-file=pred.txt

23 / 26

Page 25: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Left out from this presentation

I parallelization

I --cross-validation=K --chronological-split=2012-01-01

I limiting the test users/candidate items

I attribute- and relation-aware recommenders

I user-to-item recommendation

I top-n evaluation of rating predictors: rating based ranking

I --online-evaluation

I --repeated-items

I --save-model=FILE --load-model=FILE

I --cutoff=1.05 --measure=RMSE --epsilon=0.001

I tricks to save memory, e.g. --no-id-mapping --rating-type=byte

I . . .

24 / 26

Page 26: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Instead of a Conclusion: 2-Hour ProjectsParallel processing

I similarity computation

I BPR matrix factorization

Correlations

I Dice, Tyversky

I Jaccard index for {1,−1, ?}Algorithms

I SGD learning for WRMF

I ALS learning for MF

I your favorite algorithm

Evaluation

I expected reciprocal rank (ERR)

I Kendall’s Tau; Spearman

http://recsyswiki.com/wiki/MyMediaLite/Workshop_projects

25 / 26

Page 27: Hands-on Recommender System Experiments ... - RecSys Challenge2012.recsyschallenge.com/wp-content/uploads/2012/... · Left out from this presentation I parallelization I--cross-validation=K

Want to work in Berlin?

Nokia is hiring!

[email protected]

26 / 26