Upload
mike-dusenberry
View
83
Download
0
Embed Size (px)
Citation preview
DEEP LEARNINGw/ Apache SystemML
Mike DusenberryEngineer, Machine Learning & SystemML
IBM Spark Technology Center, SF@dusenberrymw
Berkeley - 08.24.16
DEEP LEARNING
w/ Apache SystemML
1. Backgrounda. Apache Sparkb. Machine Learningc. Declarative Machine Learning
2. Apache SystemML3. Deep Learning
a. Overviewb. Plansc. SystemML-NN
4. Demo5. Questions
Agenda
Apache Spark
Apache Spark● System for large-scale data processing on clusters.● Combines ML, SQL, streaming, and other complex analytics.● Extends Scala idioms, as well as R/Python DataFrame idioms to cluster
computing.● APIs for Scala, Java, Python, R.● Simple to use!● Much more information at https://spark.apache.org/.
Machine Learning
Machine Learning● Data
○ Multiple “examples”○ Multiple “features” per “example”○ “Label(s)” for each “example” (supervised)
● Model○ Construct/select a model that fits the problem.○ Examples:
■ Linear/Logistic Regression■ SVM■ Neural Networks
● Loss○ An “evaluation” of how well the model fits the data.
● Optimizer○ Minimize “loss” by adjusting model to better fit the data.
-A Neural Algorithm of Artistic Style, L.A. Gatys, A.S. Ecker, M. Bethge-https://github.com/jcjohnson/neural-style
Declarative Machine Learning
Laptop
Exploratory Data Analysis Today
8
R
Python
Others
DataScientist
DataR
Python
Others
DataScientist
Laptop
Exploratory Data Analysis Today
9
R
Python
Others
DataScientist
R
Python
Others
DataScientist
Current Best Practice for Big Data Analysis
DataScientist
DataScientist
DataScientist
HadoopEngineer
SparkEngineer
MPIEngineer
R
Python
Others
Laptop
DataScientist
Scale-up
Cluster
R
Python Query Optimization
Others
Vision: Declarative Machine Learning
Apache SystemML
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Example: Logistic Regression (DML)
SystemML - Example: Sigmoid Function (DML)
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Compilation Chain
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Architecture (APIs and runtime)
23
Command Line JMLC Spark
MLContextSpark
MLAPIs
High-Level Operators (HOPs)
Parser/Language
Low-Level Operators (LOPs)
Compiler
Runtime
Control ProgramRuntime
ProgBuffer Pool
ParFor Optimizer/Runtime
MR InstSpark Inst
CPInst
Recompiler
Cost-based optimizations
DFS IOMem/FS IO
Generic MR
MatrixBlock Library(single/multi-threaded)
SystemML - Architecture (APIs and runtime)
24
Command Line JMLC Spark
MLContextSpark
MLAPIs
High-Level Operators (HOPs)
Parser/Language
Low-Level Operators (LOPs)
Compiler
Runtime
Control ProgramRuntime
ProgBuffer Pool
ParFor Optimizer/Runtime
MR InstSpark Inst
CPInst
Recompiler
Cost-based optimizations
DFS IOMem/FS IO
Generic MR
MatrixBlock Library(single/multi-threaded)
SystemML - Spark API (Python)
Deep Learning
Deep Learning● Subfield of machine learning.● Essentially focused on creating
large, complex, nonlinear functions to map from inputs to predictions, and in the process learn complex representations of the data.
● Key: These complex functions are built through a deep composition of simple, modular units.
● = Neural Networks -A Neural Algorithm of Artistic Style, L.A. Gatys, A.S. Ecker, M. Bethge-https://github.com/jcjohnson/neural-style
Deep Learning - Neural Networks● Class of models● Composition of simple, modular
units, including nonlinear units.● Example units:
○ Core:■ Affine (fully-connected)■ Convolution (1D, 2D, 3D)■ Pooling (max, average)
○ Nonlinearity/Transfer:■ Sigmoid, Tanh, Softmax, ReLU
○ Regularization:■ Dropout, L1, L2
○ Loss:■ Log-loss, Cross-entropy, L1, L2
http://cs231n.github.io/
Deep Learning - Convolutional Neural Networks● State of the art model class for computer vision tasks.
○ Classification○ Retrieval○ Detection○ Segmentation○ playing Go
● Architecture makes an assumption of images as inputs.
http://cs231n.github.io/
More Fun...
https://github.com/google/deepdream
Deep Learning - SystemML-NN Library● Deep learning library written in DML.● Multiple layers:
○ Core:■ Affine, 2D Convolution, Max
Pooling, RNN, LSTM○ Nonlinearity/Transfer:
■ Sigmoid, Tanh, Softmax, ReLU○ Regularization:
■ Dropout, L1, L2○ Loss:
■ Log-loss, Cross-entropy, L1, L2● Multiple optimizers:
○ SGD, SGD w/ momentum, SGD w/ Nesterov momentum, Adagrad, RMSprop, Adam
https://github.com/dusenberrymw/systemml-nn
Deep Learning - SystemML-NN Library (cont.)
https://github.com/dusenberrymw/systemml-nn
● Each layer type has a simple `forward(...)` and `backward(...)` API.
○ `forward(...)` computes the output of the function based on the inputs.
○ `backward(...)`computes the partial derivatives (gradient) of the inputs to the function w.r.t. some function deeper in the network (usually the loss function at the end).
● Each optimizer has a simple `update(...)` API.
○ `update(...)` adjusts the given parameters based on their partial derivatives.
● Includes test code in DML.○ Gradient checks, unit tests
Demohttps://github.com/dusenberrymw/systemml-nn/tree/master/examples
Deep Learning - SystemML-NN Library (cont.)
SystemML-NN
SystemMLEngine
Deep Learning - SystemML-NN Library (cont.)
SystemML-NN
SystemMLEngine
Keras Torch
TensorFlow Caffe
DEEP LEARNING
w/ Apache SystemML
1. Backgrounda. Apache Sparkb. Machine Learningc. Declarative Machine Learning
2. Apache SystemML3. Deep Learning
a. Overviewb. Plansc. SystemML-NN
4. Demo5. Questions
Agenda Revisited
Questions?
Links
● SystemML:○ Website: systemml.apache.org○ Code:
github.com/apache/incubator-systemml○ Deep Learning Library:
github.com/dusenberrymw/systemml-nn○ Email:
[email protected]● Contact:
○ Twitter: @dusenberrymw○ GitHub: github.com/dusenberrymw○ Email: [email protected]
Thanks!