Spark Summit EU talk by Nick Pentreath

SPARK SUMMIT EUROPE 2016

SCALING FACTORIZATION MACHINES ON APACHE SPARK WITH PARAMETER SERVERS

Nick PentreathPrincipal Engineer, IBM

About• About me

– @MLnick– Principal Engineer at IBM working on machine

learning & Spark– Apache Spark PMC– Author of Machine Learning with Spark

Agenda• Brief Intro to Factorization Machines• Distributed FMs with Spark and Glint• Results• Challenges• Future Work

FACTORIZATION MACHINES

Factorization MachinesFeature interactions

Factorization MachinesLinear Models

𝑤" +$𝑤%𝑥%

Not expressive enough!

Bias terms

Factorization MachinesPolynomial Regression

𝑤" +$𝑤%𝑥% +$ $ 𝑤%*𝑥%𝑥*

Bias terms Interaction term

n << d

Factorization MachinesFactorization Machine

Bias terms Factorized interaction term

𝑤" +$𝑤%𝑥% +$ $ 𝑣%𝑣* 𝑥%𝑥*

O(d2k)

O(d2k) O(dk)Math hack

aka reformulation

Not convex, but efficient to train using SGD, coordinate descent, MCMC

Standard matrix factorization(with biases)

Contextual featuresMetadata features

Model size can still be very large! e.g. video sharing, online ads, social networks

DISTRIBUTED FM MODELS ON SPARK

Linear Models on Spark• Data parallel

Master

Send model to workers

Update model

Update

Final model

Worker

Compute partial gradient using latest full model

Worker

Compute partial gradient using latest full model

Tree aggregate

Broadcast

Bottleneck!

Parameter Servers• Model & data parallel

Master

Schedule work

Worker

Compute partial gradient using latest partial model

Parameter Server

Update model

Pull partial model

Push partial gradient

Only push/pull required features

Distributed FMs• spark-libFM

– Uses old MLlib GradientDescent and LBFGS interfaces• DiFacto

– Async SGD implementation using parameter server (ps-lite)– Adagrad, L1 regularization, frequency-adaptive model-size

• Key is that most real-world datasets are highly sparse(especially high-cardinality categorical data), e.g. online ads, social network, recommender systems

• Workers only need access to a small piece of the model

GlintFM• Procedure:

1. Construct Glint Client2. Create distributed

parameters3. Pre-compute required

feature indices (per partition)

4. Iterate:• Pull partial model

(blocking)• Compute partial gradient

& update• Push partial update to

parameter servers (can be async)

5. Done!

Glint is a parameter server built using AkkaFor more info, see Rolf’s talk at 17:15!

RESULTS

DataCriteo Display Advertising Challenge Dataset• 45m examples, 34m unique features, 48 nnz /example

ions Unique Values per Feature

100%Feature Occurence (%)

Feature Extraction

Raw Data StringIndexer OneHotEncoder VectorAssemblerOOM!

Feature ExtractionSolution? “Stringify” + CountVectorizer

Row(i1=u'1', i2=u'1', i3=u'5', i4=u'0', i5=u’1382’,... )

Row(raw=[u'i1=1', u'i2=1', u'i3=5', u'i4=0', u'i5=1382', ...)

Convert set of String features into Seq[String]

Feature Extraction

Raw Data Stringify Count Vectorizer

Feature Extraction

Raw Data Stringify HashingTF

Performance

k=0 k=6 k=16 k=32

Total run time (s)*

Mllib FM Glint FM

*10 iterations, 48 partitions, fit intercept

Model size 1.9GB

Model size 4.6GB

2GB limit

Performance

Broadcast Read

*k = 6, 10 iterations, 48 partitions, fit intercept

Compute

Gradient computation

Aggregation & collect

Performance

MLLib FM Glint FM

Median time per iteration (s)

OtherComputeCommunication

MB / iteration

Data Transfer

Mllib FM

Glint FM

CHALLENGES & FUTURE WORK

Challenges• Tuning configuration

– Glint - models/server, message size, Akka frame size– Spark - data partitioning (can be seen as “mini-batch”

size)• Lack of server-side processing in Glint

– For L1 regularization, adaptive sparsity, Adagrad– These result in better performance, faster execution

• Backpressure / concurrency control

Challenges• Tuning models / server

6 12 24Models / parameter server

Total runtime

6 12 24Models / parameter server

Median iteration time

Max iteration time

Challenges• Index partitioning for “hot features”

Total run time (s)*

CV features

Hashed features

– CountVectorizer for features leads to hot spots & straggler tasks due to sorting by occurrence

– OneHotEncoder OOMed... but can also face this problem

– Spreading out features is critical (used feature hashing)

Future Work• Glint enhancements

– Add features from DiFacto, i.e. L1 regularization, Adagrad & memory-adaptive k

– Requires support for UDFs on the server– Built-in backpressure (Akka Artery / Streams?)– Key caching – 2x decrease in message size

• Mini-batch SGD within partitions• Distributed solvers for ALS, MCMC, CD• Relational data / block structure formulation

– www.vldb.org/pvldb/vol6/p337-rendle.pdf

References• Factorization Machines

– http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf– https://github.com/ibayer/fastFM– www.libfm.org– https://github.com/zhengruifeng/spark-libFM– https://github.com/scikit-learn-contrib/polylearn

• DiFacto– https://github.com/dmlc/difacto– www.cs.cmu.edu/~yuxiangw/docs/fm.pdf

• Glint / parameter servers– https://github.com/rjagerman/glint– https://github.com/dmlc/ps-lite

SPARK SUMMIT EUROPE 2016

THANK YOU.@MLnick

github.com/MLnick/glint-fmspark.tc

Spark Summit EU talk by Nick Pentreath

Data & Analytics

Spark Plug Thread Repair Spark Plug Spark Plug Sockets for Ford

Spark and Spark SQL - Amir H. Payberah · Spark and Spark SQL Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah (SICS) Spark and Spark SQL June 29, 2016 1 / 71

Mazda RX-8 Spark Plug and Spark Plug Wire Install Guide5xracing.com/...spark-plug-and-spark-plug-wire-installation-guide.pdf · Mazda RX-8 Spark Plug and Spark Plug Wire Install Guide

Spark Summit 2014: Spark Job Server Talk

Using Spark @ Conviva Spark Summit 2013

Learning spark ch09 - Spark SQL

Gas Power Systems Nick DiFilippo. Spark Ignition Compression Ignition Compression Ratio

Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick Evans

Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick Pentreath

Budapest Spark Meetup - Apache Spark @enbrite.ly

McDonough Spark Tutorial Spark Summit 2013

Productionizing Spark and the Spark Job Server

Poundbury Fact Sheet - Home | The Duchy of Cornwallduchyofcornwall.org/assets/images/documents/Poundbury Factsheet... · Architects: Ben Pentreath, Jonathan Holland, Craig Hamilton,

Adidas by Agustin Pentreath

Learning spark ch10 - Spark Streaming

Spark summit2014 techtalk - testing spark

Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Data processing in Apache Spark•Next week`s lecture is about higher level Spark –Scripting and Prototyping in Spark •Spark SQL •DataFrames –Spark Streaming Pelle Jakovits

Kerberizing spark. Spark Summit east

Intro to Spark and Spark SQL