41
Recommender systems Arnaud De Bruyn Doctoral student in Marketing The Pennsylvania State University The Smeal College of Business Administration 701-L Business Administration Building University Park, PA 16802 Phone: (814) 865-5944 Fax: (814) 865-3015 Email: [email protected]

A Hybrid Recommendation system

Embed Size (px)

DESCRIPTION

Analysis and details of the various recommendation systems.Authored by: Arnaud De Bruyn

Citation preview

Page 1: A Hybrid Recommendation system

Recommender systems

Arnaud De Bruyn

Doctoral student in Marketing

The Pennsylvania State University

The Smeal College of Business Administration

701-L Business Administration Building

University Park, PA 16802

Phone: (814) 865-5944

Fax: (814) 865-3015

Email: [email protected]

Page 2: A Hybrid Recommendation system

The articles

• E-Commerce Recommendation Applications– J. Ben Schafer– Joseph A. Konstan– John Riedl

• Recommender Systems– Paul Resnick– Hal R. Varian

• Scope:– Introducing recommender systems– Classifying along several dimensions

Page 3: A Hybrid Recommendation system

Outline

• Types of recommendation systems– Search-based recommendations– Category-based recommendations– Collaborative filtering– Clustering– (Horting)– Association rules– Information filtering– Classifiers

• Taxonomy of recommender systems– Targeted customer inputs– Community inputs– Outputs delivered

Page 4: A Hybrid Recommendation system

Part I

Types of recommendation systems

Page 5: A Hybrid Recommendation system

Search-based recommendations

• The only visitor types a search query– « data mining customer »

• The system retrieves all the items that correspond to that query– e.g. 6 books

• The system recommend some of these books based on general, non-personalized ranking (sales rank, popularity, etc.)

Page 6: A Hybrid Recommendation system

Search-based recommendations

• Pros:– Simple to implement

• Cons:– Not very powerful– Which criteria to use to rank recommendations?– Is it really « recommendations »?– The user only gets what he asked

Page 7: A Hybrid Recommendation system

Category-based recommendations

• Each item belongs to one category or more.• Explicit / implicit choice:

– The customer select a category of interest (refine search, opt-in for category-based recommendations, etc.).

• « Subjects > Computers & Internet > Databases > Data Storage & Management > Data Mining »

– The system selects categories of interest on the behalf of the customer, based on the current item viewed, past purchases, etc.

• Certain items(bestsellers,new items) areeventuallyrecommended

Page 8: A Hybrid Recommendation system

Category-based recommendations

• Pros:– Still simple to implement

• Cons:– Again: not very powerful, which criteria to use to order

recommendations? is it really « recommendations »?– Capacity highly dependd upon the kind of categories

implemented• Too specific: not efficient

• Not specific enough: no relevant recommendations

Page 9: A Hybrid Recommendation system

Collaborative filtering

• Collaborative filtering techniques « compare » customers, based on their previous purchases, to make recommendations to « similar » customers

• It’s also called « social » filtering• Follow these steps:

– 1. Find customers who are similar (« nearest neighbors ») in term of tastes, preferences, past behaviors

– 2. Aggregate weighted preferences of these neighbors– 3. Make recommendations based on these aggregated,

weighted preferences (most preferred, unbought items)

Page 10: A Hybrid Recommendation system

Collaborative filtering

• Example: the system needs to make recommendations to customer C

• Customer B is very close to C (he has bought all the books C has bought). Book 5 is highly recommended

• Customer D is somewhat close. Book 6 is recommended to a lower extent

• Customers A and E are not similar at all. Weight=0

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Customer A X XCustomer B X X XCustomer C X XCustomer D X XCustomer E X X

Page 11: A Hybrid Recommendation system

Collaborative filtering

• Pros:– Extremely powerful and efficient– Very relevant recommendations– (1) The bigger the database, (2) the more the past

behaviors, the better the recommendations

• Cons:– Difficult to implement, resource and time-consuming– What about a new item that has never been purchased?

Cannot be recommended– What about a new customer who has never bought

anything? Cannot be compared to other customers no items can be recommended

Page 12: A Hybrid Recommendation system

Clustering

• Another way to make recommendations based on past purchases of other customers is to cluster customers into categories

• Each cluster will be assigned « typical » preferences, based on preferences of customers who belong to the cluster

• Customers within each cluster will receive recommendations computed at the cluster level

Page 13: A Hybrid Recommendation system

Clustering

• Customers B, C and D are « clustered » together. Customers A and E are clustered into another separate group

• « Typicical » preferences for CLUSTER are:– Book 2, very high– Book 3, high– Books 5 and 6, may be recommended– Books 1 and 4, not recommended at all

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Customer A X XCustomer B X X XCustomer C X XCustomer D X XCustomer E X X

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Customer A X XCustomer B X X XCustomer C X XCustomer D X XCustomer E X X

Page 14: A Hybrid Recommendation system

Clustering

• How does it work?• Any customer that shall be classified as a member of

CLUSTER will receive recommendations based on preferences of the group:– Book 2 will be highly recommended to Customer F– Book 6 will also be recommended to some extent

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Customer A X XCustomer B X X XCustomer C X XCustomer D X XCustomer E X X

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Customer A X XCustomer B X X XCustomer C X XCustomer D X XCustomer E X XCustomer F X X

Page 15: A Hybrid Recommendation system

Clustering

• Problem: customers may belong to more than one cluster; clusters may overlap

• Predictions are then averaged across the clusters, weighted by participation

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Customer A X XCustomer B X X XCustomer C X XCustomer D X XCustomer E X XCustomer F X X

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Customer A X XCustomer B X X XCustomer C X XCustomer D X XCustomer E X XCustomer F X X

Page 16: A Hybrid Recommendation system

Clustering

• Pros:– Clustering techniques work on aggregated data: faster– It can also be applied as a « first step » for shrinking the

selection of relevant neighbors in a collaborative filtering algorithm

• Cons:– Recommendations (per cluster) are less relevant than

collaborative filtering (per individual)

Page 17: A Hybrid Recommendation system

Association rules

• Clustering works at a group (cluster) level• Collaborative filtering works at the customer level• Association rules work at the item level

Page 18: A Hybrid Recommendation system

Association rules

• Past purchases are transformed into relationships of common purchases

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Book 1 1 1Book 2 2 1 1Book 3 2 2Book 4 1Book 5 1 1 2Book 6 1

Cu

sto

mer

s w

ho

bo

ug

ht…

Also bought…

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Customer A X XCustomer B X X XCustomer C X XCustomer D X XCustomer E X XCustomer F X X

Page 19: A Hybrid Recommendation system

Association rules

• These association rules are then used to made recommendations

• If a visitor has some interest in Book 5, he will be recommended to buy Book 3 as well

• Of course, recommendations are constrained to some minimum levels of confidence

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Book 1 1 1Book 2 2 1 1Book 3 2 2Book 4 1Book 5 1 1 2Book 6 1

Cu

sto

mer

s w

ho

bo

ug

ht…

Also bought…

Page 20: A Hybrid Recommendation system

Book 1 Book 2 Book 3 Book 4 Book 5 Book 6Book 1 1 1Book 2 2 1 1Book 3 2 2Book 4 1Book 5 1 1 2Book 6 1

Cu

sto

mer

s w

ho

bo

ug

ht…

Also bought…

Association rules

• What if recommendations can be made using more than one piece of information?

• Recommendations are aggregated

• If a visitor is interested in Books 3 and 5, he will be recommended to buy Book 2, than Book 3

Page 21: A Hybrid Recommendation system

Association rules

• Pros:– Fast to implement– Fast to execute– Not much storage space required– Not « individual » specific– Very successful in broad applications for large populations,

such as shelf layout in retail stores

• Cons:– Not suitable if knowledge of preferences change rapidly– It is tempting to do not apply restrictive confidence rules

May lead to litteraly stupid recommendations

Page 22: A Hybrid Recommendation system

Information filtering

• Association rules compare items based on past purchases

• Information filtering compare items based on their content

• Also called « content-based filtering » or « content-based recommendations »

Page 23: A Hybrid Recommendation system

Information filtering

• What is the « content » of an item?

• It can be explicit « attributes » or « characteristics » of the item. For example for a film:– Action / adventure– Feature Bruce Willis– Year 1995

• It can also be « textual content » (title, description, table of content, etc.)– Several techniques exist to compute the distance between

two textual documents

Page 24: A Hybrid Recommendation system

Information filtering

• How does it work?– A textual document is scanned and parsed– Word occurrences are counted (may be stemmed)– Several words or « tokens » are not taken into account. That

includes « stop words » (the, a, for), and words that do not appear enough in documents

– Each document is transformed into a normed TFIDF vector, size N (Term Frequency / Inverted Document Frequency).

– The distance between any pair of vector is computed

2

N

IDFTF

IDFTFTFIDF

Page 25: A Hybrid Recommendation system

Information filtering

2

N

IDFTF

IDFTFTFIDF

)1log( countTF

inoccurstermthedocs

docsIDF

#

1#log

Page 26: A Hybrid Recommendation system

Information filtering

• An (unrealistic) example: how to compute recommendations between 8 books based only on their title?

• Books selected:– Building data mining applications for CRM

– Accelerating Customer Relationships: Using CRM and Relationship Technologies

– Mastering Data Mining: The Art and Science of Customer Relationship Management

– Data Mining Your Website

– Introduction to marketing

– Consumer behavior

– marketing research, a handbook

– Customer knowledge management

Page 27: A Hybrid Recommendation system

building data mining

applications for crm

Accelerating Customer

Relationships: Using CRM and

Relationship Technologies

Mastering Data Mining: The Art and Science of

Customer Relationship Management

Data Mining Your Website

Introduction to marketing

consumer behavior

marketing research, a handbook

customer knowledge

management

a 1accelerating 1and 1 1application 1art 1behavior 1building 1consumer 1crm 1 1customer 1 1 1data 1 1 1for 1handbook 1introduction 1knowledge 1management 1 1marketing 1 1mastering 1mining 1 1 1of 1relationship 2 1research 1science 1technology 1the 1to 1using 1website 1your 1

COUNT

Page 28: A Hybrid Recommendation system

building data mining

applications for crm

Accelerating Customer

Relationships: Using CRM and

Relationship Technologies

Mastering Data Mining: The Art and Science of

Customer Relationship Management

Data Mining Your Website

Introduction to marketing

consumer behavior

marketing research, a handbook

customer knowledge

management

a 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000accelerating 0.000 0.432 0.000 0.000 0.000 0.000 0.000 0.000and 0.000 0.296 0.256 0.000 0.000 0.000 0.000 0.000application 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000art 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000behavior 0.000 0.000 0.000 0.000 0.000 0.707 0.000 0.000building 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000consumer 0.000 0.000 0.000 0.000 0.000 0.707 0.000 0.000crm 0.344 0.296 0.000 0.000 0.000 0.000 0.000 0.000customer 0.000 0.216 0.187 0.000 0.000 0.000 0.000 0.381data 0.251 0.000 0.187 0.316 0.000 0.000 0.000 0.000for 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000handbook 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000introduction 0.000 0.000 0.000 0.000 0.636 0.000 0.000 0.000knowledge 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.763management 0.000 0.000 0.256 0.000 0.000 0.000 0.000 0.522marketing 0.000 0.000 0.000 0.000 0.436 0.000 0.368 0.000mastering 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000mining 0.251 0.000 0.187 0.316 0.000 0.000 0.000 0.000of 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000relationship 0.000 0.468 0.256 0.000 0.000 0.000 0.000 0.000research 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000science 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000technology 0.000 0.432 0.000 0.000 0.000 0.000 0.000 0.000the 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000to 0.000 0.000 0.000 0.000 0.636 0.000 0.000 0.000using 0.000 0.432 0.000 0.000 0.000 0.000 0.000 0.000website 0.000 0.000 0.000 0.632 0.000 0.000 0.000 0.000your 0.000 0.000 0.000 0.632 0.000 0.000 0.000 0.000

TFIDF Normed Vectors

Data

0.187 0.316

Data miningyour website

Mastering Data Mining:The Art and Science

of Customer RelationshipManagement

Page 29: A Hybrid Recommendation system

Information filtering

• A customer is interested in the following book:« Building data mining applications for CRM »

• The system computes distances between this book and the 7 others

• The « closest » books are recommended:– #1: Data Mining Your Website – #2: Accelerating Customer Relationships: Using CRM and

Relationship Technologies– #3: Mastering Data Mining: The Art and Science of

Customer Relationship Management– Not recommended: Introduction to marketing– Not recommended: Consumer behavior– Not recommended: marketing research, a handbook– Not recommended: Customer knowledge management

Page 30: A Hybrid Recommendation system

Information filtering

• Pros:– No need for past purchase history– Not extremely difficult to implement

• Cons:– « Static » recommendations– Not efficient is content is not very informative

e.g. information filtering is more suited to recommend technical books than novels or movies

Page 31: A Hybrid Recommendation system

Classifiers

• Classifiers are general computational models• They may take in inputs:

– Vector of item features (action / adventure, Bruce Willis)– Preferences of customers (like action / adventure)– Relations among items

• They may give as outputs:– Classification– Rank– Preference estimate

• That can be a neural network, Bayesian network, rule induction model, etc.

• The classifier is trained using a training set

Page 32: A Hybrid Recommendation system

Classifiers

• Pros:– Versatile– Can be combined with other methods to improve accuracy

of recommendations

• Cons:– Need a relevant training set

Page 33: A Hybrid Recommendation system

Part II

Taxonomy of recommendation systems

Page 34: A Hybrid Recommendation system

Taxonomy

• How can we classify recommender systems?– Targeted customer inputs– Community inputs– Recommendation method– Outputs– Delivery– Degree of personnalization

Page 35: A Hybrid Recommendation system

Targeted customer inputs

• Implicit navigation– Implicit navigation gives information to the recommender

system to make recommendations(e.g. « the page you’ve just made »)

• Explicit navigation– Customer need to explicitely visit recommendations page

• Keyword / item attributes– Queries, « …have also bought », « other action films », etc.

• Attribute ratings– Explicit inputs

• Purchase history

Page 36: A Hybrid Recommendation system

Community inputs

• Item attributes– Film genre, book categories

• External item popularity– Top 50, bestsellers, etc.

• Community purchase history• Ratings

– Costumers average ratings

• Text comments– Customer comments

Page 37: A Hybrid Recommendation system

Recommendation method

• Raw retrieval– Queries

• Manually selected– E.g. category-based browsing

• Statistical summaries– Within-community popularity measures,

aggregate or summary ratings

• Attribute-based recommendations• Item-to-item correlation

– Matching items or set of items, co-purchase data, preference by common customers

• User-to-user correlation– Collaborative filtering, clustering

Page 38: A Hybrid Recommendation system

Outputs

• Suggestion• Prediction• Ratings• Reviews

Page 39: A Hybrid Recommendation system

Delivery

• Push– Pro-active

• Pull– Allow customers to control when recommendations are

displayed

• Passive– Such as displaying recommendations for products related to

the current product

Page 40: A Hybrid Recommendation system

Degree of personalization

• Non-personalized– General recommendations

• Ephemeral– Using current / recent behaviors only

• Persistent– Using stored, past purchase behaviors

Page 41: A Hybrid Recommendation system

That’s all folks!

Thank you!