117
Apache Mahout – Tutorial (2015) Cataldo Musto, Ph.D. Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2014/2015 07/01/2015 1

Mahout Tutorial and Hands-on (version 2015)

Embed Size (px)

Citation preview

Page 1: Mahout Tutorial and Hands-on (version 2015)

Apache Mahout – Tutorial (2015) Cataldo Musto, Ph.D.

Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale

Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2014/2015

07/01/2015

1

Page 2: Mahout Tutorial and Hands-on (version 2015)

Outline

• What is Mahout ?

– Overview

• How to use Mahout ?

– Hands-on session

2

Page 3: Mahout Tutorial and Hands-on (version 2015)

Part 1

What is Mahout?

3

Page 4: Mahout Tutorial and Hands-on (version 2015)

What is (a) Mahout?

4

an elephant driver

Page 5: Mahout Tutorial and Hands-on (version 2015)

• Mahout is a Java library

– Implementing Machine Learning techniques

5

What is (a) Mahout?

Page 6: Mahout Tutorial and Hands-on (version 2015)

• Mahout is a Java library

– Implementing Machine Learning techniques

• Clustering

• Classification

• Recommendation

• Frequent ItemSet

6

What is (a) Mahout?

Page 7: Mahout Tutorial and Hands-on (version 2015)

• Mahout is a Java library

– Implementing Machine Learning techniques

• Clustering

• Classification

• Recommendation

• Frequent ItemSet (removed)

7

What is (a) Mahout?

Page 8: Mahout Tutorial and Hands-on (version 2015)

What can we do?

• Currently Mahout supports mainly four use cases: – Recommendation - takes users' behavior and tries

to find items users might like.

– Clustering - takes e.g. text documents and groups them into groups of topically related documents.

– Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category.

8

Page 9: Mahout Tutorial and Hands-on (version 2015)

Why Mahout?

• Mahout is not the only ML framework

– Weka

– R (http://www.r-project.org/)

• Why do we prefer Mahout?

(http://www.cs.waikato.ac.nz/ml/weka/)

9

Page 10: Mahout Tutorial and Hands-on (version 2015)

Why Mahout?

• Why do we prefer Mahout?

– Apache License

– Good Community

– Good Documentation

10

Page 11: Mahout Tutorial and Hands-on (version 2015)

Why Mahout?

• Why do we prefer Mahout?

– Apache License

– Good Community

– Good Documentation

–Scalable

11

Page 12: Mahout Tutorial and Hands-on (version 2015)

Why Mahout?

• Why do we prefer Mahout?

– Apache License

– Good Community

– Good Documentation

–Scalable • Based on Hadoop (not mandatory!)

12

Page 13: Mahout Tutorial and Hands-on (version 2015)

Why do we need a scalable framework?

Big Data!

13

Page 15: Mahout Tutorial and Hands-on (version 2015)

Use Cases

15

Page 16: Mahout Tutorial and Hands-on (version 2015)

Use Cases

Recommendation Engine on Foursquare 16

Page 17: Mahout Tutorial and Hands-on (version 2015)

Use Cases

User Interest Modeling on Twitter 17

Page 18: Mahout Tutorial and Hands-on (version 2015)

Use Cases

Pattern Mining on Yahoo! (as anti-spam) 18

Page 19: Mahout Tutorial and Hands-on (version 2015)

Algorithms

• Recommendation

– User-based Collaborative Filtering

– Item-based Collaborative Filtering

– Matrix Factorization-based CF

• Several factorization techniques

19

Page 20: Mahout Tutorial and Hands-on (version 2015)

Algorithms

• Clustering

– K-Means

– Fuzzy K-Means

– Streaming K-Means

– etc.

• Topic Modeling

– LDA (Latent Dirichlet Allocation)

20

Page 21: Mahout Tutorial and Hands-on (version 2015)

Algorithms

• Classification

– Logistic Regression

– Bayes (only on Hadoop, from release 0.9)

– Random Forests (only on Hadoop, from release 0.9)

– Hidden Markov Models

– Perceptrons

21

Page 22: Mahout Tutorial and Hands-on (version 2015)

Algorithms

• Other

– Text processing

• Creation of sparse vectors from text

– Dimensionality Reduction techniques

• Principal Component Analysis (PCA)

• Singular Value Decomposition (SVD)

– (and much more.)

22

Page 23: Mahout Tutorial and Hands-on (version 2015)

Mahout in the Apache Software Foundation

23

Page 24: Mahout Tutorial and Hands-on (version 2015)

Mahout in the Apache Software Foundation

Original Mahout Project

24

Page 25: Mahout Tutorial and Hands-on (version 2015)

Mahout in the Apache Software Foundation

Taste: collaborative filtering framework

25

Page 26: Mahout Tutorial and Hands-on (version 2015)

Mahout in the Apache Software Foundation

Lucene: information retrieval software library

26

Page 27: Mahout Tutorial and Hands-on (version 2015)

Mahout in the Apache Software Foundation

Hadoop: framework for distributed storage and programming based on MapReduce

27

Page 28: Mahout Tutorial and Hands-on (version 2015)

Next Releases: Watch out!

Hadoop is going to be replaced by Apache Spark

28

X

http://spark.apache.org

Page 29: Mahout Tutorial and Hands-on (version 2015)

General Architecture

Three-tiers architecture (Application, Algorithms and Shared Libraries)

29

Page 30: Mahout Tutorial and Hands-on (version 2015)

General Architecture

Business Logic

30

Page 31: Mahout Tutorial and Hands-on (version 2015)

General Architecture

Data Storage and Shared Libraries

31

Page 32: Mahout Tutorial and Hands-on (version 2015)

General Architecture

External Applications invoking Mahout APIs

32

Page 33: Mahout Tutorial and Hands-on (version 2015)

In this tutorial we will focus on

Recommendation

33

Page 34: Mahout Tutorial and Hands-on (version 2015)

Recommendation

• Mahout implements a Collaborative Filtering framework – Uses historical data (ratings, clicks, and purchases) to

provide recommendations • User-based: recommend items by finding similar users. This is

often harder to scale because of the dynamic nature of users; • Item-based: calculate similarity between items and make

recommendations. Items usually don't change much, so this often can be computed offline; – Popularized by Amazon and others

• Matrix factorization-based: split the original user-item matrix in smaller matrices in order to analyze item rating patterns and learn some latent factors explaining users’ behavior and items characteristics. – Popularized by Netflix Prize

34

Page 35: Mahout Tutorial and Hands-on (version 2015)

Recommendation - Architecture

35

Page 36: Mahout Tutorial and Hands-on (version 2015)

Recommendation Workflow

Inceptive Idea:

A Java/J2EE application invokes a Mahout

Recommender whose DataModel is based on a set of User Preferences that are

built on the ground of a physical DataStore

36

Page 37: Mahout Tutorial and Hands-on (version 2015)

Physical Storage (database, files, etc.)

37

Recommendation Workflow

Page 38: Mahout Tutorial and Hands-on (version 2015)

Physical Storage (database, files, etc.)

Data Model

38

Recommendation Workflow

Page 39: Mahout Tutorial and Hands-on (version 2015)

Physical Storage (database, files, etc.)

Data Model

Recommender

39

Recommendation Workflow

Page 40: Mahout Tutorial and Hands-on (version 2015)

External Application

Physical Storage (database, files, etc.)

Data Model

Recommender

40

Recommendation Workflow

Page 41: Mahout Tutorial and Hands-on (version 2015)

Recommendation in Mahout

• Input: raw data (user preferences) • Output: preferences estimation

• Step 1

– Mapping raw data into a DataModel Mahout-compliant

• Step 2 – Tuning recommender components

• Similarity measure, neighborhood, etc.

• Step 3 – Computing rating estimations

• Step 4 – Evaluating recommendation

41

Page 42: Mahout Tutorial and Hands-on (version 2015)

Recommendation Components

• Mahout key abstractions are implemented through Java interfaces : – DataModel interface

• Methods for mapping raw data to a Mahout-compliant form

– UserSimilarity interface • Methods to calculate the degree of correlation between two users

– ItemSimilarity interface • Methods to calculate the degree of correlation between two items

– UserNeighborhood interface • Methods to define the concept of ‘neighborhood’

– Recommender interface • Methods to implement the recommendation step itself

42

Page 43: Mahout Tutorial and Hands-on (version 2015)

Recommendation Components

• Mahout key abstractions are implemented through Java interfaces :

– example: DataModel interface

• Each methods for mapping raw data to a Mahout-compliant form is an implementation of the generic interface

• e.g. MySQLJDBCDataModel feeds a DataModel from a MySQL database

• (and so on)

43

Page 44: Mahout Tutorial and Hands-on (version 2015)

Components: DataModel

• A DataModel is the interface to draw information about user preferences.

• Which sources is it possible to draw? – Database

• MySQLJDBCDataModel, PostgreSQLDataModel • NoSQL databases supported: MongoDBDataModel, CassandraDataModel

– External Files • FileDataModel

– Generic (preferences directly feed through Java code) • GenericDataModel

(They are all implementations of the DataModel interface)

44

Page 45: Mahout Tutorial and Hands-on (version 2015)

• GenericDataModel – Feed through Java calls

• FileDataModel – CSV (Comma Separated Values)

• JDBCDataModel – JDBC Driver – Standard database structure

Components: DataModel

45

Page 46: Mahout Tutorial and Hands-on (version 2015)

FileDataModel – CSV input

46

Page 47: Mahout Tutorial and Hands-on (version 2015)

Components: DataModel

• Regardless the source, they all share a common implementation.

• Basic object: Preference

– Preference is a triple (user,item,score)

– Stored in UserPreferenceArray

47

Page 48: Mahout Tutorial and Hands-on (version 2015)

Components: DataModel

• Basic object: Preference

– Preference is a triple (user,item,score)

– Stored in UserPreferenceArray

• Two implementations

– GenericUserPreferenceArray

• It stores numerical preference, as well.

– BooleanUserPreferenceArray

• It skips numerical preference values.

48

Page 49: Mahout Tutorial and Hands-on (version 2015)

Components: UserSimilarity

• UserSimilarity defines a notion of similarity between two Users. – (respectively) ItemSimilarity defines a notion of similarity

between two Items.

• Which definition of similarity are available?

– Pearson Correlation – Spearman Correlation – Euclidean Distance – Tanimoto Coefficient – LogLikelihood Similarity

– Already implemented!

49

Page 50: Mahout Tutorial and Hands-on (version 2015)

Example: TanimotoDistance

50

Page 51: Mahout Tutorial and Hands-on (version 2015)

Example: CosineSimilarity

51

Page 52: Mahout Tutorial and Hands-on (version 2015)

Different Similarity definitions

influence neighborhood formation

52

Page 53: Mahout Tutorial and Hands-on (version 2015)

Pearson’s vs. Euclidean distance

53

Page 54: Mahout Tutorial and Hands-on (version 2015)

Pearson’s vs. Euclidean distance

54

Page 55: Mahout Tutorial and Hands-on (version 2015)

Pearson’s vs. Euclidean distance

55

Page 56: Mahout Tutorial and Hands-on (version 2015)

Components: UserNeighborhood

• Which definition of neighborhood are available? – Nearest N users

• The first N users with the highest similarity are labeled as ‘neighbors’

– Thresholds • Users whose similarity is above a threshold are labeled as ‘neighbors’

– Already implemented!

56

Page 57: Mahout Tutorial and Hands-on (version 2015)

Components: Recommender

• Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item

• Which recommendation algorithms are implemented? – User-based CF

– Item-based CF

– SVD-based CF

(and much more…)

57

Page 58: Mahout Tutorial and Hands-on (version 2015)

Recap

• Many implementations of a CF-based recommender! – Different recommendation algorithms – Different neighborhood definitions – Different similarity definitions

• Evaluation fo the different implementations is actually very time-consuming – The strength of Mahout lies in that it is possible to save

time in the evaluation of the different combinations of the parameters!

– Standard interface for the evaluation of a Recommender System

58

Page 59: Mahout Tutorial and Hands-on (version 2015)

Evaluation

• Mahout provides classes for the evaluation of a recommender system

– Prediction-based measures

• Mean Average Error

• RMSE (Root Mean Square Error)

– IR-based measures

• Precision, Recall, F1-measure, F1@n

• NDCG (ranking measure)

59

Page 60: Mahout Tutorial and Hands-on (version 2015)

Evaluation

• Prediction-based Measures

– Class: AverageAbsoluteDifferenceEvaluator

– Method: evaluate()

– Parameters:

• Recommender implementation

• DataModel implementation

• TrainingSet size (e.g. 70%)

• % of the data to use in the evaluation (smaller % for fast prototyping)

60

Page 61: Mahout Tutorial and Hands-on (version 2015)

Evaluation

• IR-based Measures

– Class: GenericRecommenderIRStatsEvaluator

– Method: evaluate()

– Parameters:

• Recommender implementation

• DataModel implementation

• Relevance Threshold (mean+standard deviation)

• % of the data to use in the evaluation (smaller % for fast prototyping)

61

Page 62: Mahout Tutorial and Hands-on (version 2015)

Part 2

How to use Mahout? Hands-on

62

Page 63: Mahout Tutorial and Hands-on (version 2015)

Download Mahout

• Download – The latest Mahout release is 0.9

– Available at: http://archive.apache.org/dist/mahout/0.9/mahout-distribution-0.9.zip

– Extract all the libraries and include them in a new NetBeans (Eclipse) project

• Requirement: Java 1.6.x or greater.

• Hadoop is not mandatory!

63

Page 65: Mahout Tutorial and Hands-on (version 2015)

Exercise 1

• Create a Preference object

• Set preferences through some simple Java call

• Print some statistics about preferences

– How many preferences? On which items?

– Wheter a user has expressed preference on a certain item.

– Which one is the item with the highest score?

65

Page 66: Mahout Tutorial and Hands-on (version 2015)

Hints

• Hints about objects to be used:

– Preference

• Methods setUserId, setItemId, setValue;

– GenericUserPreferenceArray

• Dimension = number of preferences to be defined;

• Methods: getIds(), sortByValueReversed(),hasPrefWithItemId(id);

66

Page 67: Mahout Tutorial and Hands-on (version 2015)

Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;

import org.apache.mahout.cf.taste.model.Preference;

import org.apache.mahout.cf.taste.model.PreferenceArray;

class CreatePreferenceArray {

private CreatePreferenceArray() {

}

public static void main(String[] args) {

PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);

user1Prefs.setUserID(0, 1L);

user1Prefs.setItemID(0, 101L);

user1Prefs.setValue(0, 2.0f);

user1Prefs.setItemID(1, 102L);

user1Prefs.setValue(1, 3.0f);

Preference pref = user1Prefs.get(1);

System.out.println(pref);

}

} 67

Page 68: Mahout Tutorial and Hands-on (version 2015)

Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;

import org.apache.mahout.cf.taste.model.Preference;

import org.apache.mahout.cf.taste.model.PreferenceArray;

class CreatePreferenceArray {

private CreatePreferenceArray() {

}

public static void main(String[] args) {

PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);

user1Prefs.setUserID(0, 1L);

user1Prefs.setItemID(0, 101L);

user1Prefs.setValue(0, 2.0f);

user1Prefs.setItemID(1, 102L);

user1Prefs.setValue(1, 3.0f);

Preference pref = user1Prefs.get(1);

System.out.println(pref);

}

}

Score 2 for Item 101

68

Page 69: Mahout Tutorial and Hands-on (version 2015)

Exercise 2

• Create a DataModel

• Feed the DataModel through some simple Java calls

• Print some statistics about data (how many users, how many items, maximum ratings, etc.)

69

Page 70: Mahout Tutorial and Hands-on (version 2015)

Exercise 2

• Hints about objects to be used: – FastByIdMap

• PreferenceArray stores the preferences of a single user • Where do the preferences of all the users are stored?

– An HashMap? No. – Mahout introduces data structures optimized for

recommendation tasks – HashMap are replaced by FastByIDMap

– Model • Methods: getNumItems(),

getNumUsers(),getMaxPreference() • General statistics about the model.

70

Page 71: Mahout Tutorial and Hands-on (version 2015)

Exercise 2: data model import org.apache.mahout.cf.taste.impl.common.FastByIDMap;

import org.apache.mahout.cf.taste.impl.model.GenericDataModel;

import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;

import org.apache.mahout.cf.taste.model.DataModel;

import org.apache.mahout.cf.taste.model.PreferenceArray;

class CreateGenericDataModel {

private CreateGenericDataModel() {

}

public static void main(String[] args) {

FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>();

PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10);

prefsForUser1.setUserID(0, 1L);

prefsForUser1.setItemID(0, 101L);

prefsForUser1.setValue(0, 3.0f);

prefsForUser1.setItemID(1, 102L);

prefsForUser1.setValue(1, 4.5f);

preferences.put(1L, prefsForUser1);

DataModel model = new GenericDataModel(preferences);

System.out.println(model);

}

}

71

Page 72: Mahout Tutorial and Hands-on (version 2015)

Exercise 3

• Create a DataModel

• Feed the DataModel through a CSV file

• Calculate similarities between users

– CSV file should contain enough data!

72

Page 73: Mahout Tutorial and Hands-on (version 2015)

Exercise 3

• Hints about objects to be used:

– FileDataModel

• Argument: new File with the path of the CSV

– PearsonCorrelationSimilarity, TanimotoCoefficientSimilarity, etc.

• Argument: the model

73

Page 74: Mahout Tutorial and Hands-on (version 2015)

Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.impl.model.*;

import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {

public static void main(String[] args) throws Exception {

// Istanzia il DataModel e crea alcune statistiche

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity pearson = new PearsonCorrelationSimilarity(model);

UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);

System.out.println("Pearson:"+pearson.userSimilarity(1, 2));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));

System.out.println("Pearson:"+pearson.userSimilarity(1, 3));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));

}

}

74

Page 75: Mahout Tutorial and Hands-on (version 2015)

Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.impl.model.*;

import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {

public static void main(String[] args) throws Exception {

// Istanzia il DataModel e crea alcune statistiche

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity pearson = new PearsonCorrelationSimilarity(model);

UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);

System.out.println("Pearson:"+pearson.userSimilarity(1, 2));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));

System.out.println("Pearson:"+pearson.userSimilarity(1, 3));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));

}

} FileDataModel

75

Page 76: Mahout Tutorial and Hands-on (version 2015)

Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.impl.model.*;

import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {

public static void main(String[] args) throws Exception {

// Istanzia il DataModel e crea alcune statistiche

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity pearson = new PearsonCorrelationSimilarity(model);

UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);

System.out.println("Pearson:"+pearson.userSimilarity(1, 2));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));

System.out.println("Pearson:"+pearson.userSimilarity(1, 3));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));

}

} Similarity Definitions

76

Page 77: Mahout Tutorial and Hands-on (version 2015)

Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.impl.model.*;

import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {

public static void main(String[] args) throws Exception {

// Istanzia il DataModel e crea alcune statistiche

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity pearson = new PearsonCorrelationSimilarity(model);

UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);

System.out.println("Pearson:"+pearson.userSimilarity(1, 2));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));

System.out.println("Pearson:"+pearson.userSimilarity(1, 3));

System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));

}

} Output

77

Page 78: Mahout Tutorial and Hands-on (version 2015)

Exercise 4

• Create a DataModel

• Feed the DataModel through a CSV file

• Calculate similarities between users

– CSV file should contain enough data!

• Generate neighboorhood

• Generate recommendations

78

Page 79: Mahout Tutorial and Hands-on (version 2015)

Exercise 4

• Create a DataModel

• Feed the DataModel through a CSV file

• Calculate similarities between users

– CSV file should contain enough data!

• Generate neighboorhood

• Generate recommendations

– Compare different combinations of parameters!

79

Page 80: Mahout Tutorial and Hands-on (version 2015)

Exercise 4

• Create a DataModel

• Feed the DataModel through a CSV file

• Calculate similarities between users

– CSV file should contain enough data!

• Generate neighboorhood

• Generate recommendations

– Compare different combinations of parameters!

80

Page 81: Mahout Tutorial and Hands-on (version 2015)

Exercise 4

• Hints about objects to be used:

– NearestNUserNeighborhood

– GenericUserBasedRecommender

• Parameters: – data model already shown

– Neighborhood

» Class: NearestNUserNeighborhood(n,similarity,model)

» Class: ThresholdUserNeighborhood(thr,similarity,model)

– similarity measure already shown

81

Page 82: Mahout Tutorial and Hands-on (version 2015)

Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

} 82

Page 83: Mahout Tutorial and Hands-on (version 2015)

Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

FileDataModel

83

Page 84: Mahout Tutorial and Hands-on (version 2015)

Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

2 neighbours

84

Page 85: Mahout Tutorial and Hands-on (version 2015)

Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

Top-1 Recommendation for User 1 85

Page 86: Mahout Tutorial and Hands-on (version 2015)

• Download the GroupLens dataset (100k) – Its format is already Mahout compliant

– http://files.grouplens.org/datasets/movielens/ml-100k.zip

• Preparatory Exercise: repeat exercise 3 (similarity calculations) with a bigger dataset

• Next: now we can run the recommendation framework against a state-of-the-art dataset

Exercise 5: MovieLens Recommender

86

Page 87: Mahout Tutorial and Hands-on (version 2015)

import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("ua.base"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 20);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

Exercise 5: MovieLens Recommender

87

Page 88: Mahout Tutorial and Hands-on (version 2015)

import org.apache.mahout.cf.taste.impl.model.file.*;

import org.apache.mahout.cf.taste.impl.neighborhood.*;

import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.impl.similarity.*;

import org.apache.mahout.cf.taste.model.*;

import org.apache.mahout.cf.taste.neighborhood.*;

import org.apache.mahout.cf.taste.recommender.*;

import org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {

private RecommenderIntro() {

}

public static void main(String[] args) throws Exception {

DataModel model = new FileDataModel(new File("ua.base"));

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(

model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(10, 50);

for (RecommendedItem recommendation : recommendations) {

System.out.println(recommendation);

}

}

}

Exercise 5: MovieLens Recommender

We can play with parameters! 88

Page 89: Mahout Tutorial and Hands-on (version 2015)

Exercise 5: MovieLens Recommender

• Analyze Recommender behavior with different combinations of parameters

– Do the recommendations change with a different similarity measure?

– Do the recommendations change with different neighborhood sizes?

– Which one is the best one?

• …. Let’s go to the next exercise!

89

Page 90: Mahout Tutorial and Hands-on (version 2015)

• Evaluate different CF recommender configurations on MovieLens data

• Metrics: RMSE, MAE, Precision

Exercise 6: Recommender Evaluation

90

Page 91: Mahout Tutorial and Hands-on (version 2015)

• Evaluate different CF recommender configurations on MovieLens data

• Metrics: RMSE, MAE

• Hints: useful classes – Implementations of RecommenderEvaluator

interface • AverageAbsoluteDifferenceRecommenderEvaluator

• RMSRecommenderEvaluator

Exercise 6: Recommender Evaluation

91

Page 92: Mahout Tutorial and Hands-on (version 2015)

• Further Hints: – Use RandomUtils.useTestSeed()to ensure

the consistency among different evaluation runs

– Invoke the evaluate() method

• Parameters

– RecommenderBuilder: recommender instance (as in previous exercises.

– DataModelBuilder: specific criterion for training

– Split Training-Test: double value (e.g. 0.7 for 70%)

– Amount of data to use in the evaluation: double value (e.g 1.0 for 100%)

Exercise 6: Recommender Evaluation

92

Page 93: Mahout Tutorial and Hands-on (version 2015)

Example 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

}

Ensures the consistency between different evaluation runs.

93

Page 94: Mahout Tutorial and Hands-on (version 2015)

Exercise 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

}

94

Page 95: Mahout Tutorial and Hands-on (version 2015)

Exercise 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

}

70%training (whole dataset evaluation)

95

Page 96: Mahout Tutorial and Hands-on (version 2015)

Exercise 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

}

Recommendation Engine

96

Page 97: Mahout Tutorial and Hands-on (version 2015)

Example 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

}

} We can add more measures

97

Page 98: Mahout Tutorial and Hands-on (version 2015)

Exercise 6: evaluation class EvaluatorIntro {

private EvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

System.out.println(rmse);

}

} 98

Page 99: Mahout Tutorial and Hands-on (version 2015)

Exercise 7: item-based recommender

• Mahout provides Java classes for building an item-based recommender system

– Amazon-like

– Recommendations are based on similarities among items (generally pre-computed offline)

– Evaluate it with the MovieLens dataset!

99

Page 100: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);

return new GenericItemBasedRecommender(model, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

System.out.println(rmse); }

}

Exercise 7: item-based recommender

100

Page 101: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);

return new GenericItemBasedRecommender(model, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

System.out.println(rmse); }

} ItemSimilarity

Example 7: item-based recommender

101

Page 102: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);

return new GenericItemBasedRecommender(model, similarity);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

System.out.println(rmse); }

} No Neighborhood definition for item-based recommenders

Example 7: item-based recommender

102

Page 103: Mahout Tutorial and Hands-on (version 2015)

Exercise 8: MF-based recommender

• Mahout provides Java classes for building an a CF recommender system based on state-of-the-art matrix factorization techniques – Class: SVDRecommender

• Parameters: DataModel, Factorizer

• Factorizer: a factorization algorithm – Alternating Least Squares (ALSWRFactorizer)

– SVD++ (SVDPlusPlusFactorizer)

– Stochastic Gradient Descent (ParallelSGDFactorizer)… etc

– Several parameters to tune!

– Evaluate it with the MovieLens dataset!

103

Page 104: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

ALSWRFactorizer = new ALSWRFactorizer(model, 10, 0.065, 60);

return new SVDRecommender(model, factorizer);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

System.out.println(rmse); }

}

104

Exercise 8: MF-based recommender

Page 105: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

RecommenderEvaluator rmse = new RMSEEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

ALSWRFactorizer = new ALSWRFactorizer(model, 10, 0.065, 60);

return new SVDRecommender(model, factorizer);

}

};

double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

System.out.println(score);

System.out.println(rmse); }

}

105

Exercise 8: MF-based recommender

Hyperparameters: latent factors, lambda, iterations

Page 106: Mahout Tutorial and Hands-on (version 2015)

Mahout Strengths

• Fast-prototyping and evaluation

– To evaluate a different configuration of the same algorithm we just need to update a parameter and run again.

– Example

• Different Neighborhood Size

• Different similarity measures, etc.

106

Page 107: Mahout Tutorial and Hands-on (version 2015)

5 minutes to look for the best configuration

107

Page 108: Mahout Tutorial and Hands-on (version 2015)

• Evaluation of CF algorithms through IR measures

• Metrics: Precision, Recall

Exercise 9: Recommender Evaluation

108

Page 109: Mahout Tutorial and Hands-on (version 2015)

• Evaluation of CF algorithms through IR measures

• Metrics: Precision, Recall

• Hints: useful classes

– GenericRecommenderIRStatsEvaluator

– Evaluate() method

• Same parameters of exercise 6 and 7

Exercise 9: Recommender Evaluation

109

Page 110: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Exercise 9: IR-based evaluation

Precision@5 , Recall@5, etc. 110

Page 111: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Exercise 9: IR-based evaluation

Precision@5 , Recall@5, etc. 111

Page 112: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Exercise 9: IR-based evaluation

Set Neighborhood to 500 112

Page 113: Mahout Tutorial and Hands-on (version 2015)

class IREvaluatorIntro {

private IREvaluatorIntro() {

}

public static void main(String[] args) throws Exception {

RandomUtils.useTestSeed();

DataModel model = new FileDataModel(new File("ua.base"));

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

// Build the same recommender for testing that we did last time:

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {

@Override

public Recommender buildRecommender(DataModel model) throws TasteException {

UserSimilarity similarity = new EuclideanDistanceSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);

return new GenericUserBasedRecommender(model, neighborhood, similarity);

}

};

IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,

GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);

System.out.println(stats.getPrecision());

System.out.println(stats.getRecall());

System.out.println(stats.getF1());

}

}

Exercise 9: IR-based evaluation

Set Euclidean Distance 113

Page 114: Mahout Tutorial and Hands-on (version 2015)

• Write a class that automatically runs evaluation with different parameters

– e.g. fixed neighborhood sizes from an Array of values

– Print the best scores and the configuration

Exercise 10: Recommender Evaluation

114

Page 115: Mahout Tutorial and Hands-on (version 2015)

• Find the best configuration for several datasets

– Download datasets from http://mahout.apache.org/users/basics/collections.html

–Write classes to transform input data in a Mahout-compliant form

– Extend exercise 10!

Exercise 11: more datasets!

115

Page 116: Mahout Tutorial and Hands-on (version 2015)

End. Do you want more?

116

Page 117: Mahout Tutorial and Hands-on (version 2015)

Do you want more?

• Recommendation – Deploy of a Mahout-based Web Recommender

– Integration with Hadoop/Spark

– Integration of content-based information

– Custom similarities, Custom recommenders, Re-scoring functions

• Content-based Recommender Systems through Classification Algorithms!

117