Pragmatic Machine Learning @ ML Spain

Pragmatic Machine Learning

@louisdorard #MLSpain - 18 Jan 2016

I’M LAZY

“Programming is for lazy people who want to automate things

— AI is for lazier people who want to automate programming”

• Consider manual classification task

• Automate with ML model?

• Build PoC

• Deploy in production

• Maintain

• Monitor performance

• Update with new data6

The Laz y MLer

Phrase problem as ML task

Engineer features

Prepare data (csv)

Learn model

Make predictions

Deploy model & integrate

Evaluate model

Measure impact

• Florian Douetteau at PAPIs Connect in May 2015)

• Top companies invested more than 5M$ in their ML production platform (Facebook, Amazon, LinkedIn, Spotify…)

7

Cost of ML projec ts

http://www.slideshare.net/Dataiku/dataiku-productive-application-to-production-pap-is-may-2015-48747796?qid=44ec7fbc-d373-480b-9854-98aa112d8ffd&v=qf1&b=&from_search=36

• Real-world ML is/was complicated and costly (especially at web scale)

• Do I really need ML?

• How about Human API? (e.g. Amazon Mechanical Turk)

• → Back to Square 1 (but someone else’s problem!)

• → Baseline! (performance, time, cost)8

The Laz y MLer

Performance evaluation

How do you evaluate the performance of an ML system?

Accuracy

Latency

Throughput

11

Per formance measures

• Go beyond accuracy… example: recommendations

• Get clicks!

• → Simulate how many you’d get with your model

• → Need to learn accurately what people like — not what they dislike

• Better decisions with ML

• Revenue increase (A/B test)

• Decisions can have a cost (e.g. give special offer/pricing to customer)… ROI?13

Domain-specif ic evaluation

Decisions from predictions

1. Descriptive

2. Predictive

3. Prescriptive

15

Types of analyt ics

1. Show churn rate against time

2. Predict which customers will churn next

3. Suggest what to do about each customer (e.g. propose to switch plan, send promotional offer, etc.)

17

Churn analysis

• Who: SaaS company selling monthly subscription

• Question asked: “Is this customer going to leave within 1 month?”

• Input: customer

• Output: no-churn or churn

• Data collection: history up until 1 month ago

18

Churn predic t ion

• #TP (we predict customer churns and he does)

• #FP (we predict customer churns but he doesn’t)

• #FN (we predict customer doesn’t churn but he does)

19

Churn predic t ion accurac y

Assume we know who’s going to churn. What do we do?

• Contact them (in which order?)

• Switch to different plan

• Give special offer

• No action?

20

Churn prevention

“3. Suggest what to do about each customer” → prioritised list of actions, based on…

• Customer representation + context (e.g. competition)

• Churn prediction (& action prediction?)

• Uncertainty in predictions

• Revenue brought by customer & cost of action

• Constraints on frequency of solicitations21

Churn prevention

• Taking action for each TP (and FP) has a cost

• For each TP we “gain”: (success rate of action) * (revenue /cust. /month)

• Imagine…

• perfect predictions

• revenue /cust. /month = 10€

• success rate of action = 20%

• cost of action = 2€

• Which ROI?22

Churn prevention ROI

• We predicted customer would churn but they didn’t…

• That’s actually good! Prevention worked!

• Need to store which actions were taken

• Is ML really helping?

• Compare to baseline,e.g. if no usage for more than 15 days then predict churn

• Is fancy model really improving bottom line?23

Churn prevention evaluation

1. Show past demand against calendar

2. Predict demand for [product] at [store] in next 2 days

3. Suggest how much to ship

• Trade-off: cost of storage vs risk of lost sales

• Constraints on order size, truck volume, capacity of people putting stuff into shelves

24

Replenishment

• Context

• Predictions

• Uncertainty in predictions

• Constraints

• Costs / benefits

• Competing objectives (⇒ trade-offs to make)

• Business rules25

Decis ions are based on…

APIs are key

Software components for automated decisions:

• Create training dataset from historical data (merge sources, aggregate…)

• Provide predictive model from given training set (i.e. learn)

• Provide prediction against model for given context

• Provide optimal decision from given contextual data, predictions, uncertainties, constraints, objectives, costs

• Apply given decision27

S eparation of concerns







Operations Research component







M achine Learning components







Predic t ive APIs

The two methods of predictive APIs:

• model = create_model(‘training.csv’)

• predicted_output = create_prediction(model, new_input)

31

Predic t ive APIs

Amazon ML

BigML

Google Prediction

PredicSis

… or your own company!

32

Providers of REST http Predic t ive APIs

?

Experiment on “ScienceCluster”• Distributed jobs • Collaborative workspace • Serialize chosen model

Deploy model as API on “ScienceOps”• Load balancing • Auto scaling • Monitoring (API calls, accuracy)

• “Open source prediction server” in Scala

• Based on Spark, MLlib, Spray

• DASE framework: Data preparation, Algorithm, Serving, Evaluation

• Amazon CloudFormation template → cluster

• Manual up/down scaling40

https://aws.amazon.com/marketplace/pp/B00S74CY0A

→ PAPI+

→ PAPI+

Interesting research problems

45

Concurrenc y for high-throughput ML APIs

Brian Gawalt (Senior Data Scientist at Upwork) Talk at PAPIs ’15

upwork.com use case:

• predict freelancer availability

• huge web platform (millions of users)→ need very high throughput and low latency

• things change quickly → need freshest data & predictions

46


http://upwork.com

• event: invitation sent to freelancer

• steps to prediction:

• gather raw data from all sources

• featurize event

• make prediction


• An actor…

• gets & sends messages

• makes computations

• Actors we need:

• “Historians”: one per data source

• “Featurizer”

• “Scorer”48

Concurrenc y with Ac tor framework

49


before

50


after

• Python defacto standard: scikit-learn

• “Sparkit-learn aims to provide scikit-learn functionality and API on PySpark. The main goal of the library is to create an API that stays close to sklearn’s.”

• REST standard: PSI (Protocols & Structures for Inference)

• Pretty similar to BigML API!

• Implementation for scikit available

• Easier benchmarking! Ensembles!51

API standards?

https://github.com/lensacom/sparkit-learn

http://psi.cecs.anu.edu.au/

• “AzureML: Anatomy of a machine learning service”

• “Deploying high throughput predictive models with the actor framework”

• “Protocols and Structures for Inference: A RESTful API for Machine Learning”

• Coming soon… JMLR W&CP Volume 50

• Get updates: @papisdotio or papis.io/updates52

PAPIs ’15 Proceedings

http://twitter.com/papisdotio

http://papis.io/updates

53

Simple MLaaS comparison

Amazon Google PredicSis BigML

Accuracy 0,862 0,743 0,858 0,790

Training 135s 76s 17s 5s

Test time 188s 369s 5s 1s

louisdorard.com/blog/machine-learning-apis-comparison

http://louisdorard.com/blog/machine-learning-apis-comparison

• With SKLL (SciKit Learn Laboratory)

• Wrap each service in a scikit estimator

• Specify evaluations to perform in a config file (datasets, metrics, eval procedure)

• Need to also measure time…

• See papiseval on Github54

Automated B enchmark?

http://github.com/louisdorard/papiseval/tree/scikit

http://github.com/louisdorard/papiseval/tree/scikit

• Return of the Lazy MLer!

• Model selection

• Find optimal values for n (hyper-)parameters → optimisation problem (function in n dimensions)

• Search space of parameters, efficiently → explore vs exploit

• Bayesian optimization?55

AutoML

56

Bayesian Optimization in 1 dimension

From CODE517E

http://c0de517e.blogspot.com.es/2015/03/design-optimization-landscape.html

57

Bayesian Optimization in 1 dimension

From CODE517E

http://c0de517e.blogspot.com.es/2015/03/design-optimization-landscape.html

• Building ensembles

• Decide to continue training existing model, or to train new one

• Explore vs exploit again!

• Reward is accuracy. Let’s estimate reward for all options.

• Choose option with highest expected reward + uncertainty? (i.e. upper confidence bound)

• Limited computational budget… 58

AutoML

• Zoubin Gharahmani & James Lloyd @ Uni Cambridge

• Gaussian Processes: find (mixture of ) kernel(s) that maximises data likelihood

• Also Bayesian!

59

Automatic Stat ist ic ian

• Spearmint: “Bayesian optimization” for tuning parameters → Whetlab → Twitter

• Auto-sklearn: “automated machine learning toolkit and drop-in replacement for a scikit-learn estimator”

• See automl.org and challenge

60

Open S ource AutoML l ibrar ies

http://automl.org

https://www.codalab.org/competitions/2321

61

S cik it

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

62

S cik it

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)



63

AutoML S cik it

import autosklearnmodel = autosklearn.AutoSklearnClassifier()



• Before learning:

• Automatic feature extraction from text?

• After learning:

• Monitor new predictions and automatically retrain models when necessary?

• See panel discussion at PAPIs ‘1564

M ore automation ideas…

http://soundcloud.com/papisdotio/panel-discussion-sydney-2015

• Same as Azure ML?

• Scaling up? down?

65

Open S ource Auto S cal ing?

http://papis.io/connect

Tech talks:

• Intro to Spark

• Using ML to build an autonomous drone

• Demystifying Deep Learning (speaker needed!)

• Distributed Deep Learning with Spark on AWS

67

PAPIs Connec t (14-15 M arch, Valencia)

Topics:

• Managing technology

• FinTech

• Enterprise, Retail, Operations

• AI for Society (Nuria Oliver, Scientific Director at Telefonica R&D)

• Future of AI (Ramon Lopez de Mantaras, Director AI Research at Spanish Research Council)

68


• Dev? Bring your manager!

• Manager? Bring your devs!

• Discount code: MLSVLC20

• papis.io/connect

69


http://papis.io/connect

Technology

Pragmatic Machine Learning @ ML Spain