A recommendation engine for your applications - M.Orselli - Codemotion Rome 17

Preview:

Citation preview

A recommendation engine for your apps

Definition: a system that help people finding things when the process of finding what you need is challenging because you have a lot of choices/alternatives

So… it’s a search engine!

Search Engines

Document base is (almost) static

Queries are dynamic

Search Engines

Create an index analysing the documents

Calculate relevance for a query: tf*idf

Recommender systems

Document base is growing (eg: Netflix)

Query is static: find something I like

Classification

Domain: news, products, …

Helps defining what can be suggested

Purpose: sales, information, education, build a community

What is TripAdvisor purpose?

Personalisation levels

• Non personalised: best sellers

• Demographic: age, location

• Ephemeral: based on current activities

• Persistent

Types of input

• Explicit: ask user to rate something

• Implicit: inferred from user behaviour

Output

• Prediction: predicted rating, evaluation

• Recommendations: suggestion list, top-n, offers, promotion

• Filtering: email filters, news articles

A model for comparison

User: people with preference

Items: subject of rating

Rating: expression of opinion

(Community: space where opinions makes sense)

Non personalised

Best seller

Most popular

Trending

Summary of community ratings: eg best hotel in town

Hotel

Visitor Hotel

Visitor Hotel

Hotel A Hotel B Hotel C

John 3 5

Jane 3

Fred 1 0

Tom 4

AVG 3.5 3 0

Content based

User rate items

We build a model of user preference

Look for similar items based on the model

Action 0.7

Sci Fi 3.2

Vin Diesel 1.2

… …

https://www.amazon.com/Relevant-Search-applications-Solr-Elasticsearch/dp/161729277Xhttp://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine

Problems/Limitations

Need to know items content

User cold start: time to learn important features for the user

What if user interest change?

Lack of serendipity: accidentally discover something you like

Collaborative filtering

No need to analyse (index) content

Can capture more subtle things

Serendipity

User-User

Select people of my neighbourhood with similar taste. If other people share my taste I want their opinion combined

E.T

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3 2

4 1

User-User: which users have similar tastes?

E.T

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3 2

4 1

User-User: which users have similar tastes?

Item-Item

Find an items where I have expressed an opinion and look how other people felt about it. Precompute similarities between items

E.T

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3

4 1

Item-Item: which item are similar?

Problems/Limitations

Sparsity

When recommending from a large item set, users will have rated only some of the items

User Cold start

Not enough known about new user to decide who is similar

Item cold start

Cannot predict ratings for new item till some similar users have rated it [No problem for content-based]

Scalability

With millions of ratings, computations become slow

Dimensionality reduction FTW!

An example

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

How similar are Joe and Tom? How similar are Joe and Bob?

Only consider items both users have rated

For each item - Compute difference in the users’ ratings - Take the average of this difference over the items

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

Sim(Joe, Tom) = (|8-2| + |2-7| + |7-5|)/3 = 13/3 = 4.3

Sim(Joe, Alice) = (|8-5| + |1-4| + |2-4| + |7-7|)/4 = 2

Sim(Joe, Bob) = (|8-7| + |1-1| + |2-3| + |7-8|)/4 = 0.75

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

Similarity

Bob 0.75

Alice 2

Tom 4.3

D = 1 / 1 + d

Similarity

Bob 1.57

Alice 0.33

Tom 0.18

D = 1 / 1 + d

Recommend what similar user have rated highly

To calculate rating of an item to recommend, give weight to each user’s recommendations based on how similar they are to you.

Rating(Joe, Item3) = (1.57 * 7 + 0.33 * 7 + 0.18 * 5) / 3

10.99 + 2.31 + 0.9 / 3 = 4.3

Similarity

Bob 1.57

Alice 0.33

Tom 0.18

use entire matrix or

use a K-nn algorithm: people who historically have the same tastes as me

aggregate using weighted sum

weights depends on similarity

Cosine similarity

[3,5]

[2,7]

[0,0]

Our domain

Domain: online book shop, both paper and digital

Recommend titles, old and new

- Who bought this also bought

- You might like

Choosing the tool

PredictionIO

Under the Apache umbrella

Based on solid open source stack

Customisable templates engines

SDK for PHP

Installation

http://actionml.com/docs/pio_by_actionml

Pre-baked Amazon AMIs

Installation via source code

http://predictionio.incubator.apache.org/install/install-sourcecode/

You can choose storage

mysql/postgres vs elasticsearch+hbase

The event server

Pattern: user -- action -- item

User 1 purchased product X

User 2 viewed product Y

User 1 added product Z in the cart

$ pio app new MyApp1

[INFO] [App$] Initialized Event Store for this app ID: 1. [INFO] [App$] Created new app: [INFO] [App$] Name: MyApp1 [INFO] [App$] ID: 1 [INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F

$ pio eventserver

Server runs on port 7070 by default

$ curl -i -X GET http://localhost:7070

{“status":"alive"}

$ curl -i -X GET “http://localhost:7070/events.json?accessKey=$ACCESS_KEY"

Events modeling

what can/should we model?

rate, like, buy, view, depending on the algorithm

setUser($uid, array $properties=array(), $eventTime=null)

unsetUser($uid, array $properties, $eventTime=null)

deleteUser($uid, $eventTime=null)

setItem($iid, array $properties=array(), $eventTime=null)

unsetItem($iid, array $properties, $eventTime=null)

deleteItem($iid, $eventTime=null)

recordUserActionOnItem($event, $uid, $iid, array $properties=array(), $eventTime=null)

createEvent(array $data)

getEvent($eventId)

Engines

D.A.S.E Architecture

Data Source and Preparation

Algorithm

Serving

Evaluation

$ pio template get apache/incubator-predictionio-template-recommender MyRecommendation

$ cd MyRecommendation

engine.json

"datasource": { "params" : { "appName": “MyApp1”, "eventNames": [“buy”, “view”] } },

$ pio build —verbose

$ pio train

$ pio deploy

Getting recommendations

Implementation

2 kind of suggestions

- who bought this also bought (recommendation)

- you may like (similarities)

View

Like (add to basket, add to wishlist)

Conversion (buy)

Recorded in batch

4 engines

2 for books, 2 for ebooks

(not needed now)

Retrained every night with new data

recordLike($user, array $item)

recordConversion($user, array $item)

recordView($user, array $item)

createUser($uid)

getRecommendation($uid, $itype, $n = self::N_SUGGESTION)

getSimilarity($iid, $itype, $n = self::N_SUGGESTION)

user cold start/item cold start

if we don’t get enough suggestion switch to non personalised (also for non logged users)

Alternative approaches

https://github.com/grahamjenson/list_of_recommender_systems

Do it on your own

https://github.com/grahamjenson/ger

https://neo4j.com/developer/guide-build-a-recommendation-engine/

Michele Orselli CTO@Ideato

_orso_

micheleorselli / ideatosrl

mo@ideato.it

Links• http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-

engine-an-example-of-a-product-recommendation-engine?next_slideshow=1

• https://www.coursera.org/learn/recommender-systems-introduction

• http://actionml.com/

• https://github.com/grahamjenson/ger