Deep Learning for Developers (expanded version, 12/2017)

Deep Learning for Developers16/12/2017

Julien Simon, AI Evangelist, EMEA

@julsimon

What to expect

• AI ?

• An introduction to Deep Learning

• Common neural network architectures and use cases

• An introduction to Apache MXNet

• Demos using Jupyter notebooks on Amazon SageMaker

• Resources

• Artificial Intelligence: design software applications which exhibit human-like behavior, e.g. speech, natural language processing, reasoning or intuition

• Machine Learning: teach machines to learn without being explicitly programmed

• Deep Learning: using neural networks, teach machines to learn from complex data where features cannot be explicitly expressed

Myth: AI is dark magicaka « You’re not smart enough »

Fact: AI is math, code and chipsA bit of Science, a lot of Engineering

Amazon AI is based on Deep Learning

An introduction to Deep Learning

Activation functions

The neuron

i=1

l

xi ∗ wi = u

”Multiply and Accumulate”Source: Wikipedia

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

x =

x11, x12, …. x1I

x21, x22, …. x2I

… … …

xm1, xm2, …. xmI

I features

m samples

y =

y1

y2

…

ym

m labels,

N2 outputs

0,0,1,0,0,…,0

1,0,0,0,0,…,0

…

0,0,0,0,1,…,0

One-hot encoding

Neural networks


x =

x11, x12, …. x1I

x21, x22, …. x2I

… … …

xm1, xm2, …. xmI

I features

m samples

y =

y1

y2

…

ym

m labels,

N2 categories

Total number of predictionsAccuracy =

Number of correct predictions

0,0,1,0,0,…,0

1,0,0,0,0,…,0

…

0,0,0,0,1,…,0

One-hot encoding

Neural networks


Neural networksInitially, the network will not predict correctlyf(X1) = Y’1

A loss function measures the difference between

the real label Y1 and the predicted label Y’1error = loss(Y1, Y’1)

For a batch of samples:

𝑖=1

𝑏𝑎𝑡𝑐ℎ 𝑠𝑖𝑧𝑒

loss(Yi, Y’i) = batch error

The purpose of the training process is to

minimize error by gradually adjusting weights


Training

Training data set Training

Trained

neural network

Batch size

Learning rate

Number of epochs

Hyper parameters

Backpropagation

Stochastic Gradient Descent (SGD)

Imagine you stand on top of a mountain with skis

strapped to your feet. You want to get down to

the valley as quickly as possible, but there is fog

and you can only see your immediate

surroundings. How can you get down the

mountain as quickly as possible? You look

around and identify the steepest path down, go

down that path for a bit, again look around and

find the new steepest path, go down that path,

and repeat—this is exactly what gradient descent

does.Tim Dettmers

University of Lugano

2015

https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/

The « step size » is called

the learning rate

z=f(x,y)

Alternatives to SGD:

rmsprod, adagrad, adadelta,

adam, etc.

https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-history-training/


Val idat ion

Validation data set Trained

neural network

Validation

accuracy

Prediction at

the end of

each epoch


Early stopping

Training accuracy

Loss function

Accuracy

100%

Epochs

Validation accuracy

Loss

OV

ERFITTIN

G

Save the model at the end of each epoch

Common network architecturesand use cases

Convolutional Neural Networks (CNN)

Le Cun, 1998: handwritten digit recognition, 32x32 pixels

Convolution and pooling reduce dimensionality

https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/

https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/

https://news.developer.nvidia.com/expedia-ranking-hotel-images-with-deep-learning/

• Expedia have over 10 million images from

300,000 hotels

• Using great images boosts conversion

• Using Keras and EC2 GPU instances,

they fine-tuned a pre-trained Convolutional Neural

Network using 100,000 images

• Hotel descriptions now automatically feature the best

available images

CNN: Object Classification

https://news.developer.nvidia.com/expedia-ranking-hotel-images-with-deep-learning/

CNN: Object Detection

https://github.com/precedenceguo/mx-rcnn https://github.com/zhreshold/mxnet-yolo

https://github.com/precedenceguo/mx-rcnn

https://github.com/zhreshold/mxnet-yolo

• 17,000 images from Instagram

• 7 brands

• Deep Learning model pre-trained on ImageNet

• Fine-tuning with TensorFlow and EC2 GPU instances

• Additional work on color extraction

https://technology.condenast.com/story/handbag-brand-and-color-detection

CNN: Object Detection

https://technology.condenast.com/story/handbag-brand-and-color-detection

CNN: Object Segmentation

https://github.com/TuSimple/mx-maskrcnn

https://github.com/TuSimple/mx-maskrcnn

https://www.oreilly.com/ideas/self-driving-trucks-enter-the-fast-lane-using-deep-learning

Last June, tuSimple drove an autonomous

truck

for 200 miles from Yuma, AZ to San Diego,

CA

https://www.oreilly.com/ideas/self-driving-trucks-enter-the-fast-lane-using-deep-learning

CNN: Face Detection

https://github.com/tornadomeet/mxnet-face

https://github.com/tornadomeet/mxnet-face

Real-Time Pose Estimation

https://github.com/dragonfly90/mxnet_Realtime_Multi-Person_Pose_Estimation

https://github.com/dragonfly90/mxnet_Realtime_Multi-Person_Pose_Estimation

Long Short Term Memory Networks (LSTM)

• A LSTM neuron computes the output based on the input and a previous state

• LSTM networks have memory

• They’re great at predictingsequences, e.g. machine translation

LSTM: Machine Translation

https://github.com/awslabs/sockeye

https://github.com/awslabs/sockeye

Generative Adversarial Networks (GAN)

• Goodfellow, 2014

• Dual-network architecture

• Detector network• Sees real samples

• Learn how to detect real samples from fake ones created by the Generator network

• Generator network• Doesn’t see real samples

• Creates images from random data

• Applies the same weight updates as the Generator

• Learns gradually how to generate better fake samples

GAN: Welcome to the (un)real world, Neo

Generating new ”celebrity” faceshttps://github.com/tkarras/progressive_growing_of_gans

From semantic map to 2048x1024 picture https://tcwang0509.github.io/pix2pixHD/

https://github.com/tkarras/progressive_growing_of_gans

https://tcwang0509.github.io/pix2pixHD/

Apache MXNet

Apache MXNet: Open Source library for Deep Learning

Programmable Portable High Performance

Near linear scaling

across hundreds of

GPUs

Highly efficient

models for

mobile

and IoT

Simple syntax,

multiple

languages

Most Open Best On AWSOptimized for

Deep Learning on AWS

Accepted into the

Apache Incubator

MXNet 1.0 released on December 4th

Input Output

1 1 1

1 0 1

0 0 03

mx. sym. Convol ut i on( dat a, ker nel =( 5, 5) , num_f i l t er =20)

mx. sym. Pool i ng( dat a, pool _t ype=" max" , ker nel =( 2, 2) ,

st r i de=( 2, 2)

l st m. l st m_unr ol l ( num_l st m_l ayer , seq_l en, l en, num_hi dden, num_embed)

4 2

2 0 4=Max

1

3

...

4

0.2

-0.1

...

0.7

mx. sym. Ful l yConnect ed( dat a, num_hi dden=128)

2

mx. symbol . Embeddi ng( dat a, i nput _di m, out put _di m = k)

0.2

-0.1

...

0.7

Queen

4 2

2 0 2=Avg

Input Weights

cos(w, queen ) = cos(w, k i ng) - cos(w, m an ) + cos(w, w om an)

mx. sym. Act i vat i on( dat a, act _t ype=" xxxx" )

" r el u"

" t anh"

" si gmoi d"

" sof t r el u"

Neural Art

Face Search

Image Segmentation

Image Caption

“People Riding

Bikes”

Bicycle, People,

Road, Sport

Image Labels

Image

Video

Speech

Text

“People Riding

Bikes”

Machine Translation

“Οι άνθρωποι

ιππασίας ποδήλατα”

Events

mx. model . FeedFor war d model . f i t

mx. sym. Sof t maxOut put

Anat omy of a Deep Lear ning Model

https://github.com/awslabs/mxnet-model-server/

https://github.com/awslabs/mxnet-model-server/

https://aws.amazon.com/blogs/ai/announcing-onnx-support-for-apache-mxnet/

https://aws.amazon.com/blogs/ai/announcing-onnx-support-for-apache-mxnet/


The Apache MXNet API

• Storing and accessing data in multi-dimensional arraysNDArray API

• Building models (layers, weights, activation functions) Symbol API

• Serving data during training and validation Iterators

• Training and using models Module API

Demos

https://github.com/juliensimon/dlnotebooks

1) Hello World: learn a synthetic data set

2) Classify images with pre-trained models

3) Learn and predict MNIST with a Multi-Layer Perceptron and the LeNet CNN

4) Train, host and deploy a model with Amazon SageMaker

5) Generate MNIST samples with a GAN

https://github.com/juliensimon/dlnotebooks


One-click training for ML, DL, and

custom algorithms

Easier training with hyperparameter

optimization

Highly-optimized machine learning

algorithms

Deployment without

engineering effort

Fully-managed hosting at scale

BuildPre-built notebook

instances

DeployTrain

Amazon SageMaker


Resources

https://aws.amazon.com/machine-learning

https://aws.amazon.com/sagemaker

https://aws.amazon.com/blogs/ai

https://mxnet.incubator.apache.org

https://github.com/apache/incubator-mxnet

https://github.com/gluon-api

An overview of Amazon SageMaker https://www.youtube.com/watch?v=ym7NEYEx9x4

https://medium.com/@julsimon

https://aws.amazon.com/machine-learning

https://aws.amazon.com/sagemaker

https://aws.amazon.com/blogs/ai

https://mxnet.incubator.apache.org/

https://github.com/gluon-api/

https://github.com/gluon-api/

https://www.youtube.com/watch?v=ym7NEYEx9x4

https://medium.com/@julsimon

Thank you!

Julien Simon, AI Evangelist, EMEA

@julsimon

Technology

Deep Learning for Developers (expanded version, 12/2017)