Strategies and Tools for Parallel Machine Learning in Python

Preview:

Citation preview

Strategies & Tools for Parallel Machine Learning

in PythonPyConFR 2012 - Paris

Sunday, September 16, 2012

The Story

scikit-learn

The Cloud

stackoverflow

& kaggle

Sunday, September 16, 2012

...

Sunday, September 16, 2012

Parts of the Ecosystem

Multiple Machines with Multiple Cores ♥♥♥

Single Machine with Multiple Cores ♥♥♥

multiprocessing

Sunday, September 16, 2012

The Problem

Big CPU (Supercomputers - MPI)

Simulating stuff from models

Big Data (Google scale - MapReduce)

Counting stuff in logs / Indexing the Web

Machine Learning?

often somewhere in the middle

Sunday, September 16, 2012

Parallel ML Use Cases

• Model Evaluation with Cross Validation

• Model Selection with Grid Search

• Bagging Models: Random Forests

• Averaged Models

Sunday, September 16, 2012

Embarrassingly Parallel ML Use Cases

• Model Evaluation with Cross Validation

• Model Selection with Grid Search

• Bagging Models: Random Forests

• Averaged Models

Sunday, September 16, 2012

Inter-Process Comm. Use Cases

• Model Evaluation with Cross Validation

• Model Selection with Grid Search

• Bagging Models: Random Forests

• Averaged Models

Sunday, September 16, 2012

Cross Validation

Labels to Predict

Input Data

Sunday, September 16, 2012

Cross Validation

A B C

A B C

Sunday, September 16, 2012

Cross Validation

A B C

A B C

Subset of the data usedto train the model

Held-outtest set

for evaluation

Sunday, September 16, 2012

Cross Validation

A B C

A B C

A C B

A C B

B C A

B C A

Sunday, September 16, 2012

Model Selection

the Hyperparameters hell

p1 in [1, 10, 100]

p2 in [1e3, 1e4, 1e5]

Find the best combination of parameters that maximizes the Cross Validated Score

Sunday, September 16, 2012

Grid Search

(1, 1e3) (10, 1e3) (100, 1e3)

(1, 1e4) (10, 1e4) (100, 1e4)

(1, 1e5) (10, 1e5) (100, 1e5)

p1

p2

Sunday, September 16, 2012

(1, 1e3) (10, 1e3) (100, 1e3)

(1, 1e4) (10, 1e4) (100, 1e4)

(1, 1e5) (10, 1e5) (100, 1e5)

Sunday, September 16, 2012

Grid Search:Qualitative Results

Sunday, September 16, 2012

Grid Search:Cross Validated Scores

Sunday, September 16, 2012

Enough ML theory!

Lets go shopping^Wparallel computing!

Sunday, September 16, 2012

Single Machinewith

Multiple Cores

♥ ♥

♥ ♥

Sunday, September 16, 2012

multiprocessing>>> from multiprocessing import Pool>>> p = Pool(4)

>>> p.map(type, [1, 2., '3'])[int, float, str]

>>> r = p.map_async(type, [1, 2., '3'])>>> r.get()[int, float, str]

Sunday, September 16, 2012

multiprocessing

• Part of the standard lib

• Nice API

• Cross-Platform support (even Windows!)

• Some support for shared memory

• Support for synchronization (Lock)

Sunday, September 16, 2012

multiprocessing: limitations

• No docstrings in a stdlib module? WTF?

• Tricky / impossible to use the shared memory values with NumPy

• Bad support for KeyboardInterrupt

Sunday, September 16, 2012

Sunday, September 16, 2012

• transparent disk-caching of the output values and lazy re-evaluation (memoize pattern)

• easy simple parallel computing

• logging and tracing of the execution

Sunday, September 16, 2012

>>> from os.path.join>>> from joblib import Parallel, delayed

>>> Parallel(2)(... delayed(join)('/ect', s)... for s in 'abc')['/ect/a', '/ect/b', '/ect/c']

joblib.Parallel

Sunday, September 16, 2012

Usage in scikit-learn

• Cross Validationcross_val(model, X, y, n_jobs=4, cv=3)

• Grid SearchGridSearchCV(model, n_jobs=4, cv=3).fit(X, y)

• Random ForestsRandomForestClassifier(n_jobs=4).fit(X, y)

Sunday, September 16, 2012

>>> from joblib import Parallel, delayed>>> import numpy as np

>>> Parallel(2, max_nbytes=1e6)(... delayed(type)(np.zeros(int(i)))... for i in [1e4, 1e6])[<type 'numpy.ndarray'>, <class 'numpy.core.memmap.memmap'>]

joblib.Parallel:shared memory

Sunday, September 16, 2012

(1, 1e3) (10, 1e3) (100, 1e3)

(1, 1e4) (10, 1e4) (100, 1e4)

(1, 1e5) (10, 1e5) (100, 1e5)

Only 3 allocated datasets shared by all the concurrent workers performing

the grid search.

Sunday, September 16, 2012

Multiple Machineswith

Multiple Cores

♥ ♥

♥ ♥♥ ♥

♥ ♥♥ ♥

♥ ♥♥ ♥

♥ ♥

Sunday, September 16, 2012

MapReduce?

[ (k1, v1), (k2, v2), ...]

mapper

mapper

mapper

[ (k3, v3), (k4, v4), ...]

reducer

reducer

[ (k5, v6), (k6, v6), ...]

Sunday, September 16, 2012

Why MapReduce does not always work

Write a lot of stuff to disk for failover

Inefficient for small to medium problems

[(k, v)] mapper [(k, v)] reducer [(k, v)]

Data and model params as (k, v) pairs?

Complex to leverage for Iterative Algorithms

Sunday, September 16, 2012

• Parallel Processing Library

• Interactive Exploratory Shell

Multi Core & Distributed

IPython.parallel

Sunday, September 16, 2012

Demo!

Sunday, September 16, 2012

The AllReduce Pattern

• Compute an aggregate (average) of active node data

• Do not clog a single node with incoming data transfer

• Traditionally implemented in MPI systems

Sunday, September 16, 2012

AllReduce 0/3Initial State

Value: 2.0 Value: 0.5

Value: 1.1 Value: 3.2 Value: 0.9

Value: 1.0

Sunday, September 16, 2012

AllReduce 1/3Spanning Tree

Value: 2.0 Value: 0.5

Value: 1.1 Value: 3.2 Value: 0.9

Value: 1.0

Sunday, September 16, 2012

AllReduce 2/3Upward Averages

Value: 2.0 Value: 0.5

Value: 1.1(1.1, 1)

Value: 3.2(3.1, 1)

Value: 0.9(0.9, 1)

Value: 1.0

Sunday, September 16, 2012

AllReduce 2/3Upward Averages

Value: 2.0(2.1, 3)

Value: 0.5(0.7, 2)

Value: 1.1(1.1, 1)

Value: 3.2(3.1, 1)

Value: 0.9(0.9, 1)

Value: 1.0

Sunday, September 16, 2012

AllReduce 2/3Upward Averages

Value: 2.0(2.1, 3)

Value: 0.5(0.7, 2)

Value: 1.1(1.1, 1)

Value: 3.2(3.1, 1)

Value: 0.9(0.9, 1)

Value: 1.0(1.38, 6)

Sunday, September 16, 2012

AllReduce 3/3Downward Updates

Value: 2.0(2.1, 3)

Value: 0.5(0.7, 2)

Value: 1.1(1.1, 1)

Value: 3.2(3.1, 1)

Value: 0.9(0.9, 1)

Value: 1.38

Sunday, September 16, 2012

AllReduce 3/3Downward Updates

Value: 1.38 Value: 1.38

Value: 1.1(1.1, 1)

Value: 3.2(3.1, 1)

Value: 0.9(0.9, 1)

Value: 1.38

Sunday, September 16, 2012

AllReduce 3/3Downward Updates

Value: 1.38 Value: 1.38

Value: 1.38 Value: 1.38 Value: 1.38

Value: 1.38

Sunday, September 16, 2012

AllReduce Final State

Value: 1.38 Value: 1.38

Value: 1.38 Value: 1.38 Value: 1.38

Value: 1.38

Sunday, September 16, 2012

Working in the Cloud

• Launch a cluster of machines in one cmd: starcluster start mycluster -b 0.07

starcluster sshmaster mycluster

• Supports spotinstances!

• Ships blas, atlas, numpy, scipy!

• IPython plugin!

Sunday, September 16, 2012

Perspectives

Sunday, September 16, 2012

2012 results by Stanford / Google

Sunday, September 16, 2012

The YouTube Neuron

Sunday, September 16, 2012

Goodies

Super Ctrl-C to killall nosetests

Sunday, September 16, 2012

psutil nosetests killer script in Automator

Sunday, September 16, 2012

Bind killnose to Shift-Ctrl-C

Sunday, September 16, 2012

ps aux | grep IPython.parallel \ | grep -v grep \ | cut -d ' ' -f 2 | xargs kill

Or simpler:

Sunday, September 16, 2012

Recommended