35
Tutorial: Deep Learning Implementations and Frameworks Seiya Tokui*, Kenta Oono*, Atsunori Kanemura + , Toshihiro Kamishima + *Preferred Networks, Inc. (PFN) {tokui,oono}@preferred.jp + National Institute of Advanced Industrial Science and Technology (AIST) [email protected], [email protected] 1

Common Design of Deep Learning Frameworks

Embed Size (px)

Citation preview

Page 1: Common Design of Deep Learning Frameworks

Tutorial:Deep Learning Implementations

and FrameworksSeiya Tokui*, Kenta Oono*, Atsunori Kanemura+, Toshihiro Kamishima+

*Preferred Networks, Inc. (PFN){tokui,oono}@preferred.jp

+National Institute of Advanced Industrial Science and Technology (AIST)[email protected], [email protected]

1

Page 2: Common Design of Deep Learning Frameworks

Overview of this tutorial

•1st session (KO, 8:30 ‒ 10:00)• Introduction•Basics of neural networks•Common design of neural network implementations

•2nd session (ST, 10:30 ‒ 12:30)•Differences of deep learning frameworks•Coding examples of frameworks•Conclusion

Page 3: Common Design of Deep Learning Frameworks

Common Design ofDeep Learning FrameworksKenta Oono <[email protected]>Preferred Networks Inc.

2016/4/19 3DLIF Tutorial @ PAKDD2016

Page 4: Common Design of Deep Learning Frameworks

Objective of this part

• How deep learning frameworks represent various neural networks.

• How deep learning frameworks realize the training procedure of neural networks.

• Technology stack that is common to most of deep learning frameworks.

2016/4/19 4DLIF Tutorial @ PAKDD2016

Page 5: Common Design of Deep Learning Frameworks

Steps for training neural networks

Prepare the training dataset

Repeat until meeting some criterionPrepare for the next (mini) batch

Compute the loss (forward prop)

Initialize the Neural Network (NN) parameters

Save the NN parameters

Define how to compute the loss of this batch

Compute the gradient (backprop)Update the NN parameters

2016/4/19 5DLIF Tutorial @ PAKDD2016

Page 6: Common Design of Deep Learning Frameworks

Technology stack of DL framework

name functions example

Graphical interface DIGITS, TensorBoard

Machine learning workflowmanagement

Dataset ManagementTraining Loop

Keras, LasagneBlocks, TF Learn

Computational graph management

Build computational graphForward prop/Backprop

Theano, TensorFlowTorch.nn

Multi-dimensionalarray library

Linear algebra NumPy, CuPyEigen, torch (core)

Numerical computationpackage

Matrix operationConvolution

BLAS, cuBLAS, cuDNN

Hardware CPU, GPU

2016/4/19 6DLIF Tutorial @ PAKDD2016

Page 7: Common Design of Deep Learning Frameworks

Technology stack of DL framework

2016/4/19 7DLIF Tutorial @ PAKDD2016

name functions example

Graphical interface DIGITS, TensorBoard

Machine learning workflowmanagement

Dataset ManagementTraining Loop

Keras, LasagneBlocks, TF Learn

Computational graph management

Build computational graphForward prop/Backprop

Theano, TensorFlowTorch.nn

Multi-dimensionalarray library

Linear algebra NumPy, CuPyEigen, torch (core)

Numerical computationpackage

Matrix operationConvolution

BLAS, cuBLAS, cuDNN

Hardware CPU, GPU

Page 8: Common Design of Deep Learning Frameworks

Neural Network as a Computational Graph

• In simplest form, NN is represented as a computational graph (CG) that is a stack of bipartite DAGs (Directed Acyclic Graph) consisting of data nodes and operator nodes.

y = x1 * x2z = y - x3

x1 mul suby

x3

z

x2

data node

operator node2016/4/19 8DLIF Tutorial @ PAKDD2016

Page 9: Common Design of Deep Learning Frameworks

Example: Multi-layer Perceptron (MLP)

x Affine

W1 b1

h1 ReLU a1

Affine

W2 b2

h2 ReLU a2

Softmax y Cross

EntropyLoss

t

It is choice of implementation if CG includes weights and biases.

2016/4/19 9DLIF Tutorial @ PAKDD2016

Page 10: Common Design of Deep Learning Frameworks

Example: Recurrent Neural Network (RNN)

x1

RNNUnit h1

RNNUnit

x2

h2RNNUnit

xT

h0 ・・・ hT

RNN unit can be :• Affine + activation function• LSTM (Long Short-Term

Memory)• GRU (Gated Recurrent Unit)

x h y

xt

ht-1

ht

W b

2016/4/19 10DLIF Tutorial @ PAKDD2016

Page 11: Common Design of Deep Learning Frameworks

Example: Stacked RNN

x1

RNNUnit h1

RNNUnit

x2

h2RNNUnit

xT

h0 ・・・ hT

RNNUnit z1

RNNUnit z2

RNNUnitz0 ・・・ zT

SoftmaxAffine y

2016/4/19 11DLIF Tutorial @ PAKDD2016

Page 12: Common Design of Deep Learning Frameworks

Example: RNN with control flow nodes

loopenter s

i

predicate

pred

s

h0

x

switch s

RNNUnit

s’update

loopend y

pred=True

pred=False

• TensorFlow has control flow nodes (e.g. cond, switch, while)

• As CG has a loop, some mechanism is necessary that resolves he dependency of nodes to schedule the order of calculation.

W

b

2016/4/19 12DLIF Tutorial @ PAKDD2016

Page 13: Common Design of Deep Learning Frameworks

Automatic Differentiation

• Computes gradient of some specified data nodes (e.g. loss) with respect to each data node.

• Each operator node must have backward operation to calculate gradients w.r.t. its inputs from gradients w.r.t. its outputs (realization of chain rule).

• e.g. Function class of Chainer has backwardmethod.• e.g. Each layer classes of Caffe has Backward_cpu and Backward_gpumethods

• e.g. Autograd has a thin wrapper that adds gradient methods as a closure to most of NumPy methods.

2016/4/19 13DLIF Tutorial @ PAKDD2016

Page 14: Common Design of Deep Learning Frameworks

Backprop through CG

∇y z∇x1 z ∇z z = 1

y = x1 * x2z = y - x3

x1 mul suby

x3

z

x2

2016/4/19 14DLIF Tutorial @ PAKDD2016

Page 15: Common Design of Deep Learning Frameworks

Backprop as extended graphs

x1 mul suby

x3

z

x2

dzid

neg

mul

mul

dy

dx3

dx1

dx2

forwardpropagation

backwardpropagation

y = x1 * x2z = y - x3

2016/4/19 15DLIF Tutorial @ PAKDD2016

Page 16: Common Design of Deep Learning Frameworks

Example: Theano

2016/4/19 16DLIF Tutorial @ PAKDD2016

Page 17: Common Design of Deep Learning Frameworks

Technology stack of DL framework

2016/4/19 17DLIF Tutorial @ PAKDD2016

name functions example

Graphical interface DIGITS, TensorBoard

Machine learning workflowmanagement

Dataset ManagementTraining Loop

Keras, LasagneBlocks, TF Learn

Computational graph management

Build computational graphForward prop/Backprop

Theano, TensorFlowTorch.nn

Multi-dimensionalarray library

Linear algebra NumPy, CuPyEigen, torch (core)

Numerical computationpackage

Matrix operationConvolution

BLAS, cuBLAS, cuDNN

Hardware CPU, GPU

Page 18: Common Design of Deep Learning Frameworks

Numerical optimizer

• Many gradient-based optimization algorithms are implemented.• Stochastic Gradient Descent (SGD) is implemented in most DL

frameworks.• It depends on concrete tasks which optimizer works best.

w: parameters of neural networkθ: states of optimizerL: loss functionΓ: optimizer-specific function

initialize w, θuntil meet the criteria:

get data (x, y)calculate ∇w L(x, y; w)w, θ← Γ(w, θ, ∇w L)

2016/4/19 18DLIF Tutorial @ PAKDD2016

Page 19: Common Design of Deep Learning Frameworks

Serialization

• Save/Load the snapshot of training process in specified format (e.g. hdf5, npz, protobuf)• Models to be trained (= architectures and parameters of NNs)• States of training procedure (e.g. epoch, learning rate, momentum)

• Serialization enhance the portability of models.• Publish pre-trained model (e.g. Model Zoo (Caffe), MXNet, TensorFlow)• Import pre-trained model of other DL frameworks

• e.g. Chainer supports BVLC-official reference models of Caffe.

2016/4/19 19DLIF Tutorial @ PAKDD2016

Page 20: Common Design of Deep Learning Frameworks

Computational optimizer

• Convert CGs to make them simplified and efficient.

e.g. Theanoy = x1 * x2z = y - x3

2016/4/19 20DLIF Tutorial @ PAKDD2016

Page 21: Common Design of Deep Learning Frameworks

Abstraction of ML workflow• Offers typical training/validation/evaluation procedures as APIs.• Users should call a single API and do not have to write the procedure

manually.• e.g. fit, evaluatemethods of Model class in Keras.

2016/4/19 21DLIF Tutorial @ PAKDD2016

Prepare the training dataset

Repeat until meeting some criterionPrepare for the next (mini) batch

Compute the loss (forward prop)

Initialize the Neural Network (NN) parameters

Save the NN parameters

Define how to compute the loss of this batch

Compute the gradient (backprop)Update the NN parameters

Page 22: Common Design of Deep Learning Frameworks

Graphical interface

• Computational graph management• Editor, Visualizer

• Visualization of training procedure• Visualization of feature maps, output of NNs etc.• Transition of error and accuracy

• Performance monitor• e.g. Throughput, latency, memory usage

2016/4/19 22DLIF Tutorial @ PAKDD2016

Page 23: Common Design of Deep Learning Frameworks

Technology stack of DL framework

2016/4/19 23DLIF Tutorial @ PAKDD2016

name functions example

Graphical interface DIGITS, TensorBoard

Machine learning workflowmanagement

Dataset ManagementTraining Loop

Keras, LasagneBlocks, TF Learn

Computational graph management

Build computational graphForward prop/Backprop

Theano, TensorFlowTorch.nn

Multi-dimensionalarray library

Linear algebra NumPy, CuPyEigen, torch (core)

Numerical computationpackage

Matrix operationConvolution

BLAS, cuBLAS, cuDNN

Hardware CPU, GPU

Page 24: Common Design of Deep Learning Frameworks

GPU support

• CUDA: Computing platform for GPGPU on NVIDIA GPU • language extension, compiler, library etc.

• DL frameworks prepare wrappers for CUDA.• GPU-array library that utilizes cuBLAS, cuRAND etc.• Layer implementation with cuDNN (e.g. Convolution, sigmoid, LSTM)

• Designed to switch CPU and GPU easily.• e.g. Users can write CPU-GPU agnostic code.• e.g. Switch CPU/GPU with environment variables.

• Some framework supports Open CL as a GPU environment, but CUDA is more popular for now.

2016/4/19 24DLIF Tutorial @ PAKDD2016

Page 25: Common Design of Deep Learning Frameworks

Multi-dimensional array library (CPU / GPU)

• In charge of concrete calculation of data nodes.• Heavily depends on BLAS (CPU) or CUDA / CUDA Toolkits

(GPU)

• CPU• Third-party library: Eigen::Tensor, NumPy• Scratch: ND4J (DL4J), mshadow (MXNet)

• GPU• Third-party library: Eigen::Tensor, PyCUDA, gpuarray• Scratch: ND4J (DL4J), mshadow (MXNet), CuPy (Chainer)

2016/4/19 25DLIF Tutorial @ PAKDD2016

Page 26: Common Design of Deep Learning Frameworks

Which device to use?

• GPU is (by far) faster than CPU in most case. • Most of tensor calculation consists of element-wise calculation,

matrix multiplications and convolutions.

• Exceptional cases• Difficult to apply mini-batch technique.

• e.g. variable-length training dataset• e.g. The architecture of NN depends on the training data.

• GPU calculation cannot hide transfer of data to GPU.• e.g. Minibatch size is too small.

2016/4/19 26DLIF Tutorial @ PAKDD2016

Page 27: Common Design of Deep Learning Frameworks

Technology stack of Chainer

cuDNN

Chainer

NumPy CuPy

BLAS cuBLAS, cuRAND

CPU GPU

2016/4/19 27DLIF Tutorial @ PAKDD2016

name

Graphical interface

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray libraryNumerical computationpackage

Hardware

Page 28: Common Design of Deep Learning Frameworks

Technology stack of TensorFlow

cuDNN

TensorFlow

Eigen::Tensor

BLAS cuBLAS, cuRAND

CPU GPU

2016/4/19 28DLIF Tutorial @ PAKDD2016

name

Graphical interface

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray libraryNumerical computationpackage

Hardware

TensorBoard

TF Learn

Page 29: Common Design of Deep Learning Frameworks

Technology stack of Theano

CUDA, OpenCLCUDAToolkit

Theano

BLAS

CPU GPU

2016/4/19 29DLIF Tutorial @ PAKDD2016

name

Graphical interface

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray libraryNumerical computationpackage

Hardware

libgpuarrayNumPy

Page 30: Common Design of Deep Learning Frameworks

Technology stack of Keras

2016/4/19 30DLIF Tutorial @ PAKDD2016

name

Graphical interface

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray libraryNumerical computationpackage

Hardware

Keras

TensorFlowTheano

TechnologyStack of Theano

Technology Stack of TF

Page 31: Common Design of Deep Learning Frameworks

Summary

• Most DL frameworks have many components in common and can be organized as a similar technology stack.

• At upper layer of the stack, frameworks are designed to support users to follow typical ML workflows.• At middle layer, manipulations on computational graphs are

automated.• At lower layer, optimized tensor calculations are

implemented.

• Realization of these components differ between frameworks, as we will see in the following part.

2016/4/19 31DLIF Tutorial @ PAKDD2016

Page 32: Common Design of Deep Learning Frameworks

memorandum

2016/4/19 32DLIF Tutorial @ PAKDD2016

Page 33: Common Design of Deep Learning Frameworks

Training of Neural Networks

• L is designed so that its value gets small as the prediction more “accurate”• In deep learning context

• L : represented by neural networks• w : parameters of neural networks

argminw∑(x, y) L(x, y; w)w: parametersx: feature vectory: training labelL: loss function

e.g.: Classification problem

332016/4/19 DLIF Tutorial @ PAKDD2016

Page 34: Common Design of Deep Learning Frameworks

Layer = function + data nodes

• Layers (e.g. Fully connected layer, convolutional layer) can be considered as a function with parameters to be optimized.• In most of modern frameworks, parameters of layers can be

considered as data nodes in a computational graph.• Framework need to be differentiate which data nodes are

parameters to be optimized or data point.

342016/4/19 DLIF Tutorial @ PAKDD2016

Page 35: Common Design of Deep Learning Frameworks

Execution Engine

• It calculates the dependency between data node and schedules the execution of parts of computational graph (especially in multi-node or multi-GPU setting)

352016/4/19 DLIF Tutorial @ PAKDD2016