Tools for fast numerical optimization in Pythonstephentu/presentations/pyopt.pdf · The domain of...

Preview:

Citation preview

Tools for fast numerical optimization in Python

Stephen TuSF Python

Goals• Python is a fun language for prototyping.

• C/C++ useful for optimizing performance.

• This talk: How far can we push the boundary?

Warmup

• Why is the RHS over 2x faster?

Warmup

0

1

2

012

Warmup

• Also:

• LHS has many code inefficiencies, e.g. has to check if each element is a double (lots of branches).

• RHS can take advantage of optimized BLAS routines.

Two important ideas• Keep in mind the memory layout.

• Avoid un-necessary code overhead.

• Memory is especially important for numerical software.

• FLOPS cheaper than cache misses.

Latency numbers every programmer should know

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Latency numbers every programmer should know

http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

The domain of numerical programs is vastly different

• User facing applications want to perform tasks such as send HTTP request, make a database query, parse a string, etc.

• Numerical applications want to perform tasks such as matrix multiplication, take an SVD, solve Ax=b, etc.

Applications• Machine learning

• Recommendation systems

• Scientific computing

• Image processing

• Algorithmic trading

• List goes on and on…

Many great tools

Numba

This talk• Cython + use cases.

• Numba + use cases.

• Cython/numba in action on real examples motivated from machine learning and optimization.

Cython

Cython• Glue between Python and C/C++.

• Two main purposes:

• Wrapping existing C/C++ libraries.

• Writing Python like code that gets compiled down to C++.

Cython• Wrapping libraries manually is a pain

https://github.com/mblondel/svmlight-loader/blob/master/_svmlight_loader.cpp

Cython• Helps reduce the amount of boilerplate when

wrapping.

http://docs.cython.org/src/tutorial/clibraries.html

Cython• Reducing boilerplate is a big deal!

• As professional software devs, doing the former already feels painful.

• Imagine how much worse it would be if you were not an expert developer.

Cython• Second purpose: a Python like language for writing

code that gets compiled.

http://docs.cython.org/src/tutorial/cython_tutorial.html

Cython

http://notes-on-cython.readthedocs.org/en/latest/std_dev.html

Cython• Drawback: workflow is not as transparent anymore,

need to remember to re-compile (very easy to forget!)

• Drawback: software distribution can get annoying, for all the same reasons distributing binaries is annoying.

• pip install can be made to work, but fingers crossed user has a compatible C++ compiler.

Numba

Numba• Really cool project from Continuum.

• Write regular Python, but decorate functions with @jit.

• Watch magic happen.

Numba

Over 100x faster by adding 4 characters of source!

Numba• Drawback: Code that can be JIT-ed is a somewhat

limited subset of Python.

• Drawback: Hard to debug perf problems.

Applications

Hidden markov models• Hidden markov model (HMM): Latent (unknown)

state Z_t and observed sequence X_t, where Z_t are assumed to be a Markov chain, and X_t depends only on Z_t

http://blog.oliverparson.co.uk/2011/06/using-single-hidden-markov-model-to.html

Hidden markov models• Learning problem: Given observed sequence (X_1,

…, X_T), estimate both the transition and emission probabilities.

• Most common algorithm is expectation-maximization (EM).

http://www.cs.berkeley.edu/~stephentu/writeups/mixturemodels.pdf

Hidden markov models• Specialized to HMMs, the updates look like:

http://www.cs.berkeley.edu/~stephentu/writeups/hmm-baum-welch-derivation.pdf

Hidden markov models

Hidden markov models

Hidden markov models• The result: fairly fast implementation with a nice

Pythonic interface, with minimal effort.

data-microscopes• Project to bring non-parametric Bayesian models to

Python land.

datamicroscopes.github.io

data-microscopes• Core inference procedure is Gibbs sampling.

• In a nutshell: iterative condition on all minus coordinate of your current estimate, and sample the one left out.

• For our mixture models, requires computing

data-microscopes• User facing API written in Python— e.g. model

specification, config options, etc.

• Core inference engine written in C++.

• Glued together with Cython.

• Uses protobuf to pass complex data structures around (kind of hacky).

data-microscopes

data-microscopes• The result is a very nice Pythonic interface, backed

by a powerful C++ implementation.

Low rank matrix recovery• We focus now on a class of optimization problems

with applications to recommendation systems.

min

X2Rn1⇥n2rank(X) subj. to hAi, Xi = bi, i = 1, ...,m .

Low rank matrix recovery• Predominant algorithm for solving these problems

is the following iterative local search:

• Hold y, sigma fixed, and minimize w.r.t R

• Hold R fixed, and update y, sigma accordingly.

L(R, y,�) := hC,RRT i �mX

i=1

yi(hAi, RRT i � bi) +�

2

mX

i=1

(hAi, RRT i � bi)2

Low rank matrix recovery• Minimizing w.r.t R is a numerically intensive task, so

we would like to offload as much work as possible.

• Scipy’s fmin_l_bfgs_b is a good candidate, requiring only gradients and function evaluations.

• Hence, implement gradients / function evals in Numba.

rRL(R, y,�) = 2CR� 2mX

i=1

yiAiR+ 2�mX

i=1

(hAi, RRT i � bi)AiR

Low rank matrix recovery

To conclude• Can get the best of both worlds, even for numerical

software, by using great tools such as Cython, Numba.

• Often, only a small kernel needs to be fast, and is not hard to identify.

• Many thanks to the great developers who build such useful, practical tools!

ReferencesIn case you were interested in the algorithmic aspects of the talk:

Mixture models, clustering, HMMs:

K. Murphy. Machine learning: a probabilistic perspective. MIT Press, 2012.

Dirichlet processes, non-parametric Bayes:

R. Neal. Markov chain sampling methods for dirichlet process mixture models. Tech report, U of Toronto, 1998.

Y. W. Teh et al., Hierarchical dirichlet processes. J. Am. Stat. Assoc, 2006.

Algorithms for low rank matrix recovery:

S. Burer and R. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 2001.

L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 1996.

S. Tu and J. Wang. Practical first order methods for large scale semidefinite programming. Unpublished, 2014.

Recommended