45
1 © 2018 The MathWorks, Inc. Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive Conference 2018 Stuttgart April 17th, 2018

Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

  • Upload
    others

  • View
    4

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

1© 2018 The MathWorks, Inc.

Deep Learning in

From Concept to Embedded Code

Alexander SchreiberPrincipal Application Engineer

MathWorks Germany

MathWorks Automotive Conference 2018

Stuttgart

April 17th, 2018

Page 2: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

2

Example: Lane Detection

Transfer Learning

Alexnet

Lane detection

CNN

Post-processing

(find left/right lane

points)Image

Image with

marked lanes

Left lane coefficients

Right lane coefficients

Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c

GPU coder generates code for whole application

Page 3: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

3

Example: Lane DetectionImport of Pre-Trained

Network

Page 4: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

4

Example: Lane DetectionImport of Pre-Trained

Network

Modification of Network

Architecture

Page 5: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

5

Example: Lane DetectionImport of Pre-Trained

Network

Modification of Network

Architecture

Transfer Learning

Page 6: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

6

Example: Lane DetectionImport of Pre-Trained

Network

Modification of Network

Architecture

Transfer Learning

Verification

Page 7: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

7

Example: Lane DetectionImport of Pre-Trained

Network

Modification of Network

Architecture

Transfer Learning

Verification

Autom. CUDA

Code Generation

Page 8: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

8

Example: Lane DetectionImport of Pre-Trained

Network

Modification of Network

Architecture

Transfer Learning

Verification

Autom. CUDA

Code Generation

mex

Verification

Page 9: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

9

Example: Lane DetectionImport of Pre-Trained

Network

Modification of Network

Architecture

Transfer Learning

Verification

Autom. CUDA

Code Generation

mex

Verification

Deployment to

embedded GPU

Page 10: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

10

MATLAB Deep Learning Framework

Access Data Design + Train Deploy

▪ Manage large image sets

▪ Automate image labeling

▪ Easy access to models

▪ Automate compilation to

GPUs and CPUs using

GPU Coder:▪ 11x faster than TensorFlow

▪ 4.5x faster than MXNet

▪ Acceleration with GPU’s

▪ Scale to clusters

Page 11: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

11

Deep Learning Workflow

Files

Databases

Sensors

ACCESS AND EXPLORE

DATA

DEVELOP PREDICTIVE

MODELS

Hardware-Accelerated

Training

Hyperparameter Tuning

Network Visualization

LABEL AND PREPROCESS

DATA

Data Augmentation/

Transformation

Labeling Automation

Import Reference

Models

INTEGRATE MODELS WITH

SYSTEMS

Desktop Apps

Enterprise Scale Systems

Embedded Devices and

Hardware

Page 12: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

12

Deep Learning Workflow

Files

Databases

Sensors

ACCESS AND EXPLORE

DATA

DEVELOP PREDICTIVE

MODELS

Hardware-Accelerated

Training

Hyperparameter Tuning

Network Visualization

LABEL AND PREPROCESS

DATA

Data Augmentation/

Transformation

Labeling Automation

Import Reference

Models

INTEGRATE MODELS WITH

SYSTEMS

Desktop Apps

Enterprise Scale Systems

Embedded Devices and

Hardware

Files

Databases

Sensors

ACCESS AND EXPLORE

DATA

DEVELOP PREDICTIVE

MODELS

Hardware-Accelerated

Training

Hyperparameter Tuning

Network Visualization

INTEGRATE MODELS WITH

SYSTEMS

Desktop Apps

Enterprise Scale Systems

Embedded Devices and

Hardware

Page 13: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

13

Ground Truth Labeling

▪ Adding Ground Truth Information

▪ Semi-automated Labeling

– Object Detection

– Scene Classification

– Semantic Image Segmentation

▪ Solutions

– Ground Truth Labeler App

– Image Labeler App

LABEL AND

PREPROCESS DATA

Page 14: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

14

Importing Reference Models (e.g. AlexNet) LABEL AND

PREPROCESS DATA

Page 15: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

15

Importing Reference Models (e.g. AlexNet) LABEL AND

PREPROCESS DATA

Page 16: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

16

DEVELOP PREDICTIVE

MODELS

Hardware-Accelerated

Training

Hyperparameter Tuning

Network Visualization

Deep Learning Workflow

Files

Databases

Sensors

ACCESS AND EXPLORE

DATA

DEVELOP PREDICTIVE

MODELS

Hardware-Accelerated

Training

Hyperparameter Tuning

Network Visualization

LABEL AND PREPROCESS

DATA

Data Augmentation/

Transformation

Labeling Automation

Import Reference

Models

INTEGRATE MODELS WITH

SYSTEMS

Desktop Apps

Enterprise Scale Systems

Embedded Devices and

Hardware

Files

Databases

Sensors

ACCESS AND EXPLORE

DATA

LABEL AND PREPROCESS

DATA

Data Augmentation/

Transformation

Labeling Automation

Import Reference

Models

INTEGRATE MODELS WITH

SYSTEMS

Desktop Apps

Enterprise Scale Systems

Embedded Devices and

Hardware

Page 17: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

17

Two Approaches for Deep Learning

▪ Reusing existing feature

extraction

▪ Adapting to specific needs

▪ Requires

– Smaller training data set

– Lower training time

▪ Tailored and optimized to

specific needs

▪ Requires

– Larger training data set

– Longer training time

2. Fine-tune a pre-trained model (transfer learning)

1. Train a Deep Neural Network from Scratch

DEVELOP

PREDICTIVE MODELS

Page 18: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

18

Transfer Learning DEVELOP

PREDICTIVE MODELS

Page 19: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

19

Transfer Learning DEVELOP

PREDICTIVE MODELS

Page 20: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

20

Transfer Learning DEVELOP

PREDICTIVE MODELS

Page 21: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

21

Accelerating Training (CPU, GPU, multi-GPU, Clusters)

More GPUs

Mo

re C

PU

sDEVELOP

PREDICTIVE MODELS

Page 22: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

22

Accelerating Training (CPU, GPU, multi-GPU, Clusters)

Multiple GPU support

More GPUs

Single GPU performance

DEVELOP

PREDICTIVE MODELS

Page 23: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

23

Hyperparameter Tuning (e.g. Bayesian Optimization)

▪ Goal

– Set of optimal hyperparamters for a

training algorithm

▪ Algorithms

– Grid search

– Rando search

– Bayesian optimization

▪ Benefits

– Faster training

– Better network performance

DEVELOP

PREDICTIVE MODELS

Page 24: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

24

Visualizing and Debugging Intermediate Results

Filters…

Activations

Deep Dream

Training Accuracy Visualization Deep Dream

Layer Activations Feature Visualization

• Many options for visualizations and debugging• Examples to get started

DEVELOP

PREDICTIVE MODELS

Page 25: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

25

INTEGRATE MODELS WITH

SYSTEMS

Desktop Apps

Enterprise Scale Systems

Embedded Devices and

Hardware

Deep Learning Workflow

Files

Databases

Sensors

ACCESS AND EXPLORE

DATA

DEVELOP PREDICTIVE

MODELS

Hardware-Accelerated

Training

Hyperparameter Tuning

Network Visualization

LABEL AND PREPROCESS

DATA

Data Augmentation/

Transformation

Labeling Automation

Import Reference

Models

INTEGRATE MODELS WITH

SYSTEMS

Desktop Apps

Enterprise Scale Systems

Embedded Devices and

Hardware

Files

Databases

Sensors

ACCESS AND EXPLORE

DATA

DEVELOP PREDICTIVE

MODELS

Hardware-Accelerated

Training

Hyperparameter Tuning

Network Visualization

LABEL AND PREPROCESS

DATA

Data Augmentation/

Transformation

Labeling Automation

Import Reference

Models

Page 26: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

26

Algorithm Design to Embedded Deployment Workflow

MATLAB algorithm

(functional reference)

Functional test1 Deployment

unit-test

2

Desktop

GPU

C++

Deployment

integration-test

3

Desktop

GPU

C++

Real-time test4

Embedded GPU

.mex .lib/.dll Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

INTEGRATE MODELS

WITH SYSTEMS

(Test in MATLAB on host) (Test generated code in

MATLAB on host + GPU)

(Test generated code within

C/C++ app on host + GPU)

(Test generated code within

C/C++ app on Tegra target)

Page 27: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

27

GPUs and CUDA

CUDA

kernelsC/C++

ARM

Cortex

GPU

CUDA Cores

C/C++

CUDA Kernel

C/C++

CUDA Kernel

GPU Memory

Space

CPU Memory

Space

INTEGRATE MODELS

WITH SYSTEMS

Page 28: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

28

Challenges of Programming in CUDA for GPUs

▪ Learning to program in CUDA

– Need to rewrite algorithms for parallel processing paradigm

▪ Creating CUDA kernels

– Need to analyze algorithms to create CUDA kernels that maximize parallel processing

▪ Allocating memory

– Need to deal with memory allocation on both CPU and GPU memory spaces

▪ Minimizing data transfers

– Need to minimize while ensuring required data transfers are done at the appropriate

parts of your algorithm

INTEGRATE MODELS

WITH SYSTEMS

Page 29: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

29

GPU Coder Compilation Flow

Benefits:

▪ MATLAB as single golden

reference

▪ Much faster conversion

from MATLAB to CUDA

▪ Elimination of manual

coding errors

▪ No expert-level expertise

in parallel computing

needed

GPU Coder

CUDA Kernel creation

Memory allocation

Data transfer minimization

• Library function mapping

• Loop optimizations

• Dependence analysis

• Data locality analysis

• GPU memory allocation

• Data-dependence analysis

• Dynamic memcpy reduction

INTEGRATE MODELS

WITH SYSTEMS

Page 30: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

30

GPU Coder Output INTEGRATE MODELS

WITH SYSTEMS

Page 31: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

31

Deep Learning Network Support (with Neural Network Toolbox)

SeriesNetwork DAGNetwork

GPU Coder: R2017b

Networks: MNist

Alexnet

YOLO

VGG

Lane detection

Pedestrian detection

GPU Coder: R2018a

Networks: GoogLeNet

ResNet

SegNet

FCN

DeconvNet

Semantic

segmentation

Object

detection

INTEGRATE MODELS

WITH SYSTEMS

Page 32: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

32

Semantic Segmentation

Running in MATLAB Generated Code from GPU Coder

INTEGRATE MODELS

WITH SYSTEMS

Page 33: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

33

Algorithm Design to Embedded Deployment

MATLAB algorithm

(functional reference)

Functional test1 Deployment

unit-test

2

Tesla

GPU

C++

Deployment

integration-test

3

Tesla

GPU

C++

Real-time test4

Tegra GPU

.mex Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Cross-compiled on host

with Linaro toolchain

INTEGRATE MODELS

WITH SYSTEMS

.lib/.dll

Page 34: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

34

Alexnet Inference on NVIDIA Titan Xp

GPU Coder +

TensorRT (3.0.1)

GPU Coder +

cuDNN

Fra

mes p

er

second

Batch Size

CPU Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

GPU Pascal Titan Xp

cuDNN v7

Testing platform

MXNet (1.1.0)

GPU Coder +

TensorRT (3.0.1, int8)

TensorFlow (1.6.0)

INTEGRATE MODELS

WITH SYSTEMS

Page 35: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

35

Algorithm Design to Embedded Deployment

MATLAB algorithm

(functional reference)

Functional test1 Deployment

unit-test

2

Tesla

GPU

C++

Deployment

integration-test

3

Tesla

GPU

C++

Real-time test4

Tegra GPU

.mex Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Cross-compiled on host

with Linaro toolchain

INTEGRATE MODELS

WITH SYSTEMS

.lib/.dll

Page 36: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

36

Alexnet Deployment to Tegra: Cross-Compiled with ‘lib’

Two small changes

1. Change build-type to ‘lib’

2. Select cross-compile toolchain

INTEGRATE MODELS

WITH SYSTEMS

Page 37: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

37

0

50

100

150

200

250

300

350

400

1 16 32 64 128 256

Alexnet Inference on Jetson TX2: Performance

MATLAB GPU Coder (R2017b)

Fra

me

s p

er

se

co

nd

Batch Size

C++ Caffe (1.0.0-rc5)

TensorRT (2.1)

2x

0.85x

INTEGRATE MODELS

WITH SYSTEMS

Page 38: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

38

Deploying to GPUs and CPUs

GPU

Coder

Deep Learning

Networks

NVIDIA

cuDNN

& TensorRT

Libraries

ARM

Compute

Library

Intel

MKL-DNN

Library

INTEGRATE MODELS

WITH SYSTEMS

Page 39: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

39

Deploying to GPUs and CPUs

GPU

Coder

Deep Learning

Networks

NVIDIA

cuDNN

& TensorRT

Libraries

ARM

Compute

Library

Intel

MKL-DNN

Library

Desktop CPU

Raspberry Pi board

INTEGRATE MODELS

WITH SYSTEMS

Page 40: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

40

Deep Learning in MATLAB

▪ Integrated Deep Learning Framework

– Data Access and Preprocessing

– Deep Learning Network Design and Verification

– Integration within larger System

▪ Acceleration through GPU and Parallel Computing

– Training

– Inference

▪ Deployment through automatic CUDA Code Generation

– Desktop GPU

– Embedded GPU

ACCESS AND EXPLORE

DATA

DEVELOP PREDICTIVE

MODELS

LABEL AND PREPROCESS

DATA

INTEGRATE MODELS WITH

SYSTEMS

Page 41: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

41

GPU Coder for Deployment

Deep Neural Networks 1,2,3

Deep Learning, machine learning

Image Processing and

Computer Vision 2

Image filtering, feature detection/extraction

Signal Processing and

Communications 2

FFT, filtering, cross correlation,

5x faster than TensorFlow

2x faster than MXNet

60x faster than CPUs

for stereo disparity

20x faster than

CPUs for FFTs

GPU CoderAccelerated implementation of

parallel algorithms on GPUs & CPUs

ARM 3

Compute

Library

Intel 1

MKL-DNN

Library

2

Page 42: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

42

GPU Coder for Image Processing and Computer Vision

8x speedup

Distance

transform

5x speedup

Fog removal

700x speedup

SURF feature

extraction

18x speedup

Ray tracing

3x speedup

Frangi filter

Page 43: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

43

Design Your DNNs in MATLAB, Deploy with GPU Coder

Access Data Design + Train Deploy

▪ Manage large image sets

▪ Automate image labeling

▪ Easy access to models

▪ Automate compilation to

GPUs and CPUs using

GPU Coder:▪ 11x faster than TensorFlow

▪ 4.5x faster than MXNet

▪ Acceleration with GPU’s

▪ Scale to clusters

Page 44: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

44

Questions?

Page 45: Deep Learning in - it.mathworks.com€¦ · Deep Learning in From Concept to Embedded Code Alexander Schreiber Principal Application Engineer MathWorks Germany MathWorks Automotive

45

Thank You!