34
Spark Technology Center Oct / 27 / 16 Creating an end-to-end Recommender System with Apache Spark and Elasticsearch Jean-François Puget Nick Pentreath

Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

  • Upload
    sparktc

  • View
    1.785

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Oct / 27 / 16

Creating an end-to-endRecommender System with Apache Spark and Elasticsearch

Jean-François PugetNick Pentreath

Page 2: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

§ @JFPuget§ Distinguished Engineer, IBM Machine

Learning & Optimization

§ @MLnick§ Principal Engineer, IBM Spark

Technology Center§ Apache Spark PMC

About

Page 3: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

§ Recommender systems & the machine learning workflow

§ Data modelling for recommender systems

§ Why Spark & Elasticsearch?§ Spark ML for collaborative filtering§ Deploying & scoring recommender

models§ Demo

Agenda

Page 4: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Recommender Systems & the ML Workflow

Page 5: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

RecommenderSystems

Overview

Page 6: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

The Machine Learning Workflow

Perception

Data ??? MachineLearning ??? $$$

Page 7: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

The Machine Learning Workflow

Reality

Data

• Historical• Streaming

Ingest Data Processing

• Feature transformation & engineering

Model Training

• Model selection & evaluation

Deploy

• Pipelines, not just models

• Versioning

Live System

• Predict given new data

• Monitoring & live evaluation

Feedback Loop

Spark DataFrames

Spark ML

Various ???

Stream (Kafka)

Missing piece!

Page 8: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

The Machine Learning Workflow

Recommender Version

Data Ingest Data Processing

• Aggregation• Handle implicit

data

Model Training

• ALS• Ranking-style

evaluation

Deploy

• Model size & complexity

Live System

•User & item recommendations•Monitoring, filters

Feedback => another Event Type

Spark DataFrames

Spark ML

Elasticsearch

• User & Item Metadata

• Events

Elasticsearch

Stream (Kafka)

Page 9: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Data Modeling for Recommender Systems

Page 10: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Data modelUser and Item Metadata

! !

Page 11: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

System RequirementsUser and Item Metadata

! !

Filtering & Grouping

Business Rules

Page 12: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

User interactions

Implicit preference data

• Page view• eCommerce - cart, purchase• Media – preview, watch, listen

Intent data

• Search query

Anatomy of a User Event

Explicit preference data

• Rating• Review

Social network interactions

• Like• Share• Follow

User Interactions

!

!

!

!

!

!

!

!

Page 13: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Data modelAnatomy of a User Event

!!

! !! !

!

Page 14: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

How to handle implicit feedback?Anatomy of a User Event

!!

! !! !

!!

Page 15: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Why Spark & Elasticsearch?

Page 16: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

DataFrames§ Events & metadata are “lightly

structured” data§ Suited to DataFrames§ Pluggable external data source support

Spark ML§ Spark ML pipelines§ Scalable ALS algorithm, supporting

implicit feedback & NMF§ Cross-validation§ Custom transformers & algorithms

Why Spark?

Page 17: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Storage§ Native JSON§ Scalable§ Good support for time-series / event data§ Kibana for data visualisation§ Integration with Spark DataFrames

Scoring§ Full-text search§ Filtering§ Aggregations (grouping)§ Search ~== recommendation (more

later)

Why Elasticsearch?

Page 18: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Spark ML for Collaborative Filtering

Page 19: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Matrix FactorizationCollaborative Filtering

3 415 2

1 32 1

!

!

−1.1 3.2 4.30.2 1.4 3.12.5 0.3 2.34.3 −2.4 0.53.6 0.3 1.2

0.2 1.7 2.31.9 0.4 0.81.5 −1.2 0.3−0.4 2.1 0.62.7 0.8 1.4

! !

Page 20: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

PredictionCollaborative Filtering

3 415 2

1 32 1

!

!

−1.1 3.2 4.30.2 1.4 3.12.5 0.3 2.34.3 −2.4 0.53.6 0.3 1.2

0.2 1.7 2.31.9 0.4 0.81.5 −1.2 0.3−0.4 2.1 0.62.7 0.8 1.4

! !

Page 21: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Loading DataAlternating Least Squares

Page 22: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Implicit Preference DataAlternating Least Squares

Page 23: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Deploying & Scoring Recommendation Models

Page 24: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Full-text Search & SimilarityPrelude: Search

“cat videos”

!

!cat videos

0 0 ⋯ 0 1 ⋯0 1 ⋯ 1 1 ⋯1 1 ⋯ 0 0 ⋯1 0 ⋯ 0 1 ⋯

Sort results

0 1 ⋯ 1 0 ⋯

Scoring RankingAnalysis Term vectors

Similarity

Page 25: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Can we use the same machinery?Recommendation

!0 0 ⋯ 0 1 ⋯0 1 ⋯ 1 1 ⋯1 1 ⋯ 0 0 ⋯1 0 ⋯ 0 1 ⋯

Sort results

1.2 ⋯ −0.2 0.3

Dot product & cosine similarity… the same as we need for recommendations!

Scoring RankingAnalysis Term vectors

!

!!!

SimilarityUser (or item) vector

?

Page 26: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Delimited Payload FilterElasticsearchTerm Vectors

Raw vector

1.2 ⋯ −0.2 0.3

Term vector with payloads

0|1.2 ⋯ 3|-0.2 4|0.3

Custom analyzer

Page 27: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Custom scoring function

• Native script (Java), compiled for speed• Scoring function computes dot product by:

§ For each document vector index (“term”), retrieve payload

§ score += payload * query(i)

• Normalize with query vector norm and document vector norm for cosine similarity (“similar items”)

ElasticsearchScoring

Page 28: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Can we use the same machinery?Recommendation

! Sort results

1.2 ⋯ −0.2 0.3

Scoring RankingAnalysis Term vectors!!

Custom scoring function

!!

Delimited payload filter

−1.1 1.3 ⋯ 0.41.2 −0.2 ⋯ 0.30.5 0.7 ⋯ −1.30.9 1.4 ⋯ −0.8

!User

(or item) vector

Page 29: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

We get search engine functionality for free!ElasticsearchScoring

Page 30: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Deploying to ElasticsearchAlternating Least Squares

Page 31: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Monitoring & Feedback

Page 32: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Demo

Page 33: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Elasticsearch

Elasticsearch Spark Integration

Spark ML ALS for Collaborative Filtering

Collaborative Filtering for Implicit Feedback

Datasets

Elasticsearch Term Vectors & Payloads

Delimited Payload Filter

Vector Scoring Plugin

Kibana

References

Page 34: Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Spark Technology Center

Thanks!https://github.com/MLnick/sseu16-meetuphttps://github.com/MLnick/elasticsearch-vector-scoring