A recommendation engine for your applications - M.Orselli - Codemotion Rome 17

A recommendation engine for your apps

Definition: a system that help people finding things when the process of finding what you need is challenging because you have a lot of choices/alternatives

So… it’s a search engine!

Search Engines

Document base is (almost) static

Queries are dynamic

Search Engines

Create an index analysing the documents

Calculate relevance for a query: tf*idf

Recommender systems

Document base is growing (eg: Netflix)

Query is static: find something I like

Classification

Domain: news, products, …

Helps defining what can be suggested

Purpose: sales, information, education, build a community

What is TripAdvisor purpose?

Personalisation levels

• Non personalised: best sellers

• Demographic: age, location

• Ephemeral: based on current activities

• Persistent

Types of input

• Explicit: ask user to rate something

• Implicit: inferred from user behaviour

Output

• Prediction: predicted rating, evaluation

• Recommendations: suggestion list, top-n, offers, promotion

• Filtering: email filters, news articles

A model for comparison

User: people with preference

Items: subject of rating

Rating: expression of opinion

(Community: space where opinions makes sense)

Non personalised

Best seller

Most popular

Trending

Summary of community ratings: eg best hotel in town

Visitor Hotel

Hotel A Hotel B Hotel C

John 3 5

Jane 3

Fred 1 0

AVG 3.5 3 0

Content based

User rate items

We build a model of user preference

Look for similar items based on the model

Action 0.7

Sci Fi 3.2

Vin Diesel 1.2

… …

https://www.amazon.com/Relevant-Search-applications-Solr-Elasticsearch/dp/161729277Xhttp://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine

Problems/Limitations

Need to know items content

User cold start: time to learn important features for the user

What if user interest change?

Lack of serendipity: accidentally discover something you like

Collaborative filtering

No need to analyse (index) content

Can capture more subtle things

Serendipity

User-User

Select people of my neighbourhood with similar taste. If other people share my taste I want their opinion combined

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3 2

User-User: which users have similar tastes?

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3 2

User-User: which users have similar tastes?

Item-Item

Find an items where I have expressed an opinion and look how other people felt about it. Precompute similarities between items

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3

Item-Item: which item are similar?

Problems/Limitations

Sparsity

When recommending from a large item set, users will have rated only some of the items

User Cold start

Not enough known about new user to decide who is similar

Item cold start

Cannot predict ratings for new item till some similar users have rated it [No problem for content-based]

Scalability

With millions of ratings, computations become slow

Dimensionality reduction FTW!

An example

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

How similar are Joe and Tom? How similar are Joe and Bob?

Only consider items both users have rated

For each item - Compute difference in the users’ ratings - Take the average of this difference over the items

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

Sim(Joe, Tom) = (|8-2| + |2-7| + |7-5|)/3 = 13/3 = 4.3

Sim(Joe, Alice) = (|8-5| + |1-4| + |2-4| + |7-7|)/4 = 2

Sim(Joe, Bob) = (|8-7| + |1-1| + |2-3| + |7-8|)/4 = 0.75

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

Similarity

Bob 0.75

Alice 2

Tom 4.3

D = 1 / 1 + d

Similarity

Bob 1.57

Alice 0.33

Tom 0.18

D = 1 / 1 + d

Recommend what similar user have rated highly

To calculate rating of an item to recommend, give weight to each user’s recommendations based on how similar they are to you.

Rating(Joe, Item3) = (1.57 * 7 + 0.33 * 7 + 0.18 * 5) / 3

10.99 + 2.31 + 0.9 / 3 = 4.3

Similarity

Bob 1.57

Alice 0.33

Tom 0.18

use entire matrix or

use a K-nn algorithm: people who historically have the same tastes as me

aggregate using weighted sum

weights depends on similarity

Cosine similarity

Our domain

Domain: online book shop, both paper and digital

Recommend titles, old and new

- Who bought this also bought

- You might like

Choosing the tool

PredictionIO

Under the Apache umbrella

Based on solid open source stack

Customisable templates engines

SDK for PHP

Installation

http://actionml.com/docs/pio_by_actionml

Pre-baked Amazon AMIs

Installation via source code

http://predictionio.incubator.apache.org/install/install-sourcecode/

You can choose storage

mysql/postgres vs elasticsearch+hbase

The event server

Pattern: user -- action -- item

User 1 purchased product X

User 2 viewed product Y

User 1 added product Z in the cart

$ pio app new MyApp1

[INFO] [App$] Initialized Event Store for this app ID: 1. [INFO] [App$] Created new app: [INFO] [App$] Name: MyApp1 [INFO] [App$] ID: 1 [INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F

$ pio eventserver

Server runs on port 7070 by default

$ curl -i -X GET http://localhost:7070

{“status":"alive"}

$ curl -i -X GET “http://localhost:7070/events.json?accessKey=$ACCESS_KEY"

Events modeling

what can/should we model?

rate, like, buy, view, depending on the algorithm

setUser($uid, array $properties=array(), $eventTime=null)

unsetUser($uid, array $properties, $eventTime=null)

deleteUser($uid, $eventTime=null)

setItem($iid, array $properties=array(), $eventTime=null)

unsetItem($iid, array $properties, $eventTime=null)

deleteItem($iid, $eventTime=null)

recordUserActionOnItem($event, $uid, $iid, array $properties=array(), $eventTime=null)

createEvent(array $data)

getEvent($eventId)

Engines

D.A.S.E Architecture

Data Source and Preparation

Algorithm

Serving

Evaluation

$ pio template get apache/incubator-predictionio-template-recommender MyRecommendation

$ cd MyRecommendation

engine.json

"datasource": { "params" : { "appName": “MyApp1”, "eventNames": [“buy”, “view”] } },

$ pio build —verbose

$ pio train

$ pio deploy

Getting recommendations

Implementation

2 kind of suggestions

- who bought this also bought (recommendation)

- you may like (similarities)

Like (add to basket, add to wishlist)

Conversion (buy)

Recorded in batch

4 engines

2 for books, 2 for ebooks

(not needed now)

Retrained every night with new data

recordLike($user, array $item)

recordConversion($user, array $item)

recordView($user, array $item)

createUser($uid)

getRecommendation($uid, $itype, $n = self::N_SUGGESTION)

getSimilarity($iid, $itype, $n = self::N_SUGGESTION)

user cold start/item cold start

if we don’t get enough suggestion switch to non personalised (also for non logged users)

Alternative approaches

https://github.com/grahamjenson/list_of_recommender_systems

Do it on your own

https://github.com/grahamjenson/ger

https://neo4j.com/developer/guide-build-a-recommendation-engine/

Michele Orselli CTO@Ideato

_orso_

micheleorselli / ideatosrl

mo@ideato.it

Links• http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-

engine-an-example-of-a-product-recommendation-engine?next_slideshow=1

• https://www.coursera.org/learn/recommender-systems-introduction

• http://actionml.com/

• https://github.com/grahamjenson/ger

A recommendation engine for your applications - M.Orselli - Codemotion Rome 17

Technology

No more excuses! Let's build beautiful things. #codemotion Rome

Functional Programming You Already Know - Kevlin Henney - Codemotion Rome 2015

JavaScript Power Tools 2015 - Marcello Teodori - Codemotion Rome 2015

Codemotion Rome 2015. GlusterFS

Lean@core lean startup e cloud- - Codemotion Rome 2015

Codemotion 2012 Rome - An OpenShift Primer

Nginx for Fun & Performance - Philipp Krenn - Codemotion Rome 2015

Technology for social purpose Codemotion Rome 2014

Codemotion provinciali v. 0.1- Codemotion Rome 2015

Reactive MVP - Giorgio Natili - Codemotion Rome 2017

Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Let's talk about the cache! - Mathilde Lemée - Codemotion Rome 2015

JavaScript in 2016 (Codemotion Rome)

Measuring Micro-services - Richard Rodger - Codemotion Rome 2015

Mobile prototyping kaziak - Codemotion Rome 2015

Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017

Codemotion Rome 2016 - Polymer

Sahara presentation latest - Codemotion Rome 2015

Building Successful APIs Overnight - Orlando K - Codemotion Rome 2015

Dockerize it! @ Codemotion 2016 in Rome