A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks

Sunil Mallya

Solutions Architect, Deep Learning

A Deeper Dive into Apache MXNet on AWS

Agenda

• Apache MXNet introduction• Distributed Deep Learning with AWS Cloudformation• Deep Learning motivation and basics• MXNet programing model overview• Train our first neural network using MXNet

Deep Learning ApplicationsSignificantly improve many applications on multiple domains

image understanding speech recognition natural language processing

autonomy

• Netflix – Recommendation Engine• FINRA – Anonmaly detection, Sequence matching• TuSimple - Computer Vision for Autonomous Driving• Pinterest - Image recognition search• Mapillary - Computer vision for crowd sourced maps

AI Customers on AWS

AI Services

AI Platform

AI Engines

Amazon Rekognition

Amazon Polly

Amazon Lex

More to comein 2017

Amazon Machine Learning

Amazon Elastic MapReduce

Spark & SparkML

More to comein 2017

Apache MXNet TensorFlow Caffe Theano KerasTorch CNTK

P2 ECS LambdaEMR/Spark GreenGrass FPGA More to comein 2017

Hardware

Democratizing Artificial Intelligence

Apache MXNet

Programmable Portable High PerformanceNear linear scaling

across hundreds of GPUsHighly efficient

models for mobileand IoT

Simple syntax, multiple languages

88% efficiencyon 256 GPUs

Resnet 1024 layer network is ~4GB

Webinars

Distributed Deep Learning

IdealInception v3Resnet

Alexnet

88%Efficiency

1 2 4 8 16 32 64 128 256No. of GPUs

• Cloud formation with Deep Learning AMI

• 16x P2.16xlarge. Mounted on EFS

• Inception and Resnet: batch size 32, Alex net: batch size 512

• ImageNet, 1.2M images,1K classes

• 152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch), 0.22 top-1 error

Scaling with MXNet

Distributed Training Setup with Cloudformation

https://github.com/awslabs/deeplearning-cfn

Webinars

Deep Learning basics

Biological Neuron

slide from http://cs231n.stanford.edu/

Artificial Neuron

output

synapticweights

• InputVector of training data x

• OutputLinear function of inputs

• NonlinearityTransform output into desired range of values, e.g. for classification we need probabilities [0, 1]

• TrainingLearn the weights w and bias b

Deep Neural Network

hidden layers

The optimal size of the hidden layer (number of neurons) is usually between the size of the input and size of the output layers

Input layer

output

The “Learning” in Deep Learning

0.4 0.3

0.2 0.9

back propogation (gradient descent)

X1 != X0.4 ± 𝛿 0.3 ± 𝛿

newweights

Hidden Layer Visualization

Webinars

MXNet Programing Model

import numpy as npa = np.ones(10)b = np.ones(10) * 2c = b * a

• Straightforward and flexible.• Take advantage of language

native features (loop, condition, debugger)

• E.g. Numpy, Matlab, Torch, …

• Hard to optimize

CONSd = c + 1c

Easy to tweak with python codes

Imperative Programing

• More chances for optimization• Cross different languages• E.g. TensorFlow, Theano,

• Less flexible

CONSC can share memory with D because C is deleted later

A = Variable('A')B = Variable('B')C = B * AD = C + 1f = compile(D)d = f(A=np.ones(10),

B=np.ones(10)*2)

Declarative Programing

IMPERATIVE NDARRAY API

DECLARATIVE SYMBOLIC EXECUTOR

>>> import mxnet as mx>>> a = mx.nd.zeros((100, 50))>>> b = mx.nd.ones((100, 50))>>> c = a + b>>> c += 1>>> print(c)

>>> import mxnet as mx>>> net = mx.symbol.Variable('data')>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)>>> net = mx.symbol.SoftmaxOutput(data=net)>>> texec = mx.module.Module(net)>>> texec.forward(data=c)>>> texec.backward() NDArray can be set

as input to the graph

MXNet: Mixed programming paradigm

Webinars

Lets train our first model to classify handwritten digits

MXNet Overview

• Founded by: U.Washington, Carnegie Mellon U. (~1.5yrs old)• Recently Accepted to the Apache Incubator • State of the Art Model Support: Convolutional Neural Networks (CNN), Long

Short-Term Memory (LSTM)• Scalable: Near-linear scaling equals fastest time to model• Multi-language: Support for Scala, Python, R, etc.. for legacy code leverage and

easy integration with Spark• Ecosystem: Vibrant community from Academia and Industry

Open Source Project on Github | Apache-2 Licensed

Application Examples | Python notebooks• https://github.com/dmlc/mxnet-notebooks• Basic concepts

• NDArray - multi-dimensional array computation• Symbol - symbolic expression for neural networks• Module - neural network training and inference

• Applications• MNIST: recognize handwritten digits• Check out the distributed training results• Predict with pre-trained models• LSTMs for sequence learning• Recommender systems• Train a state of the art Computer Vision model (CNN)• Lots more..

Call to ActionMXNet Resources:• MXNet Blog Post | AWS Endorsement • Read up on MXNet and Learn More: mxnet.io• MXNet Github Repo • MXNet Recommender Systems Talk | Leo DiracDeveloper Resources:• Deep Learning AMI | Amazon Linux• Deep Learning AMI | Ubuntu – NEW!!!• P2 Instance Information• CloudFormation Template Instructions• Deep Learning Benchmark • MXNet on Lambda • MXNet on ECS/Docker• MXNet on Raspberry Pi | Wine Detector

Webinars

Thank You

smallya@amazon.comsunilmallya

A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks

Technology

DEEPER DIVE SCHOOL FACILITY FUNDING

Leases -- A Deeper Dive - Constant Contact

CHEEERS: A Deeper Dive - Prevent Child Abuse Americapreventchildabuse.org/wp-content/.../2016/10/CHEEERS-A-Deeper-Dive...Kate Whitaker, MsEd IMH-E ... LMSW Agenda CHEEERS: A Deeper

itx presentation: FHIR and the New Zealand EHR - deeper dive

The constellation diagram a deeper dive

A Deeper Dive into Music

Dive Even Deeper sysdig – Wireshark for your system

A Deeper Dive on the Digital Dozen

Dive Deeper: Finding Deep Faith Beyond Shallow Religion

A DEEPER DIVE INTO HEMATOLOGY AND ONCOLOGY

Deeper dive into dashboards video slides

Dive Deeper Into the Transmission World

Deeper Dive Into Collaboratives Understanding Structures and Roles

PERIODIC RISK EVALUATION (PRE) DOMAINS: A Deeper Dive

Deep Dive – Community Engagement...Deeper dive: Community Engagement strategies Primary Health Care Performance Initiative | 16 While deeper community engagement yields the most

Babel Case Deeper Dive 08

A Deeper Dive Into EMV - Conexxus Deeper Dive Into EMV February 19, 2015. Agenda ... EMV experience. Offices in Toronto, ... SDA/DDA/CDA Processing

PostgreSQL Extensions -A Deeper Dive · PostgreSQL Extensions -A Deeper Dive Amazon RDS for PostgreSQL Jignesh Shah. PostgreSQL Core Robust feature sets Multi-Version Concurrency

Fried sp techcon hybrid search deeper dive

(ATS3-GS03) Accelrys Enterprise Platform Deeper Dive