45
A Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018

A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

A Short History of Array Computingin PythonWolf Vollprecht, PyParis 2018

Page 2: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

TOC - Array computing in general- History up to NumPy- Libraries “after” NumPy

- Pure Python libraries- JIT / AOT compilers- Deep Learning

- NumPy extension proposal

Page 3: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Arrays

- Used practically in all scientific domains- Physics, Controls, Biological System, Big Data, Deep

Learning, Autonomous Cars …

Page 4: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Array computing

Generalize operations on scalars to … Arrays

C ← A + B

Page 5: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

What is an n-dimensional Array?

- memory region (buffer)- dimension- shape- Often strides

Layout Row Major (C) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 11Shape 3, 4 4 5 6 7

Strides 4, 1 8 9 10 11

Layout Col Major (F) 0 1 2 3 0 4 8 1 5 9 2 6 10 3 7 11Shape 3, 4 4 5 6 7

Strides 1, 3 8 9 10 11

4 el’s

1 el

Page 6: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

1957 / 1977 Fortran 77

- One of the oldest languages for scientific computing- Still a reference in benchmarks- Original implementation of BLAS & LAPACK in Fortran- Maximum of 7 dimensions

Page 7: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

1966 APL: Honorable Mention

- Seriously dense language

→ Try it online: https://tryapl.org/

Page 8: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

1987 Matlab

- Proprietary software from Mathworks- Dynamic interface to Fortran- Pioneered interactive

computing + visualization

Page 9: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

1995 Numeric

- Python numerical computing package- Inspired additions to Python (indexing syntax)

Page 10: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

~2003 NumArray

- More flexible than Numeric- Slower for small arrays, better for large arrays- Split in the community:

- SciPy remained on Numeric...

Page 11: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2006: NumPy

- “Merge” of Numeric and NumArray- Fast & flexible array computing in Python- Typed memory block- Notion of broadcasting- Vector Loops in C

Page 12: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

NumPy Broadcasting

- Broadcasting: what to do when dimensions don’t match up?

Page 13: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

NumPy ufunc

- Function that has specified input/output- np.sin:

- nin = 1, nout = 1- signature: f -> f, d -> d...

- np.add:- nin = 2, nout = 1- signature: ff -> f, dd -> d...

Page 14: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

NumPy as a Standard

- Computing needs have shifted- More specialized data containers needed- Parallelization, speed, GPU, data size …

NumPy interface de-facto standard!

Page 15: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2007 numexpr

- Avoid temporaries- R = A + B + C

-> T1 = B + C-> T2 = A + T1-> R = T2

- Evaluate in chunks

Page 16: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2007 numexpr

Page 17: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2014 Dask

- Distributed array computing- Can handle large data- Execution of function

distributed

Page 18: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2014 Dask

Page 19: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2017 pydata/sparse

- Support for sparse ndarrays- Advantages

- Higher data compression- Faster computation

- Reuses scipy.sparse (but nD!)

Page 20: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2017 pydata/sparse

- Store data in COO (coordinate list) model

Page 21: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

GPUs for computation

- Massively parallel- Great for large data- Cost of memory transfer from CPU → GPU- Other programming model

Page 22: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2015 CuPy

- CUDA-aware NumPy implementation- Part of the Chainer DL framework

Page 23: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2017 xnd

3 libraries:

- ndtypes: shape, type & memory- gumath: dispatch math functions on memory container- xnd: python bridge for typed memory

Page 24: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

JIT & AOT compilers

- Just in Time compilation for numeric code- Can give incredible speed ups

Page 25: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2012 Pythran

- A Python/NumPy to C++ AOT compiler- Supports high level optimizations in Python- C++ implementation of NumPy with expression

templates- Cython integration

(Don’t miss the talk by Serge later today!)

Page 26: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2012 Pythran

Page 27: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2012 Numba

- A Python to LLVM JIT- Takes Python and compiles it to Machine Code- GPU support (Cuda + AMD)- For High Performance: need to write explicit “for” loops

Page 28: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2012 Numba

Page 29: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Numba + ufunc

Page 30: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Numba + GPU

Page 31: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

The AI winter is over …

- Deep learning revolution- Python ecosystem benefits heavily- Lot’s of array computing

Page 32: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Computation Graph

a = b = inputc = a + bd = b + 1e = c * d

Page 33: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Computation Graph

- Abstraction of computation- Benefit: allows automatic differentiation- Optimization opportunities

- Common Subexpression Elimination- Algebraic simplifications: (y * x) / y → (x)- Constant folding (2 * 3 + a) → (6 + a)- Fuse ops

Page 34: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2007 Theano

- One of the first “Deep Learning” libraries- Works on a computation graph- Lazy evaluation- Compiles kernels to C & CUDA

Page 35: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2015 TensorFlow

- Big library from Google- Killed many others (including Theano)- Same principle as Theano- At the beginning: no compilation stage

Page 36: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2015 TensorFlow

Page 37: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2015 TensorFlow + XLA

- An experimental compiler for TensorFlow graphs- JIT + AOT modes- Uses LLVM under the hood

Page 38: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

2016 PyTorch

- Deep Learning Framework from Facebook- Computation Graph, but dynamic (no deferred graph

model)- Easier to have control flow

Page 39: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

PyTorch JIT & TorchScript

- Subset of Python that can be compiled- Generates CUDA & CPU code

Page 40: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Conclusion

- NumPy is the best … API- Many NumPy implementations- Many downstream projects

- Pandas- xarray- scikit-..., scipy

Page 41: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

The array extension proposal

- 6 months ago started by M. Rocklin- Problem: it’s hard to write generic code- Already extension points: __array__, __array_ufunc__

Page 42: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

The array extension proposal

- E.g. CuPy input → CuPy output desired- Arguments allowed to overload __array_function__

NEP 18 numpy.org/neps/nep-0018-array-function-protocol.html

Page 43: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Trends

● Ecosystem has become much richer in the past years● More compilation● More specialized NumPy implementations● __array_function__ will make it easy to write

implementation independent code

Page 44: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Thanks

● Questions?

Check out xtensor & xtensor-python

NumPy for C++ ;)

Follow me on Twitter @wuoulf or GitHub @wolfv

Page 45: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

NumPy ufunc

- Automatic broadcasting- ufunc supports:

- __call__- reduce- reduceat- accumulate- outer- inner