Apache MXNet快速构建深度学习平台Ÿº于Apache+MXNet快速构建深度...AWS中国（北京）区域由光环新网运营 MXNet 背景介绍 MXNet诞生于卡耐基-梅隆大学，也在这里发展壮大，它已经成为我所见过的最具

AWS中国（北京）区域由光环新网运营

基于Apache MXNet快速构建深度学习平台

张江山，AWS解决方案架构师

2017年5月23日


议程

MXNet介绍

MXNet在AWS上的使用


深度学习的模型和应用

0.2

-0.1

...

0.7

Input Output

1 1 1

1 0 1

0 0 03

mx.sym.Pooling(data, pool_type="max", kernel=(2,2), stride=(2,2)

lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed)

4 2

2 0 4=Max

1

3

...

4

0.2

-0.1

...

0.7

mx.sym.FullyConnected(data, num_hidden=128)

2

mx.symbol.Embedding(data, input_dim, output_dim = k)

Queen

4 2

2 0 2=Avg

Input Weights

cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman)

mx.sym.Activation(data, act_type="xxxx")

"relu"

"tanh"

"sigmoid"

"softrelu"

Neural Art

Face Search

Image Segmentation

Image Caption

“People Riding Bikes”

Bicycle, People,

Road, Sport

Image Labels

Image

Video

Speech

Text

“People Riding Bikes”

Machine Translation

“Οι άνθρωποι

ιππασίας ποδήλατα”

Events

mx.model.FeedForward model.fit

mx.sym.SoftmaxOutput

Anatomy of a Deep Learning Model

mx.sym.Convolution(data, kernel=(5,5), num_filter=20)


MXNet介绍


网络生成方法

梯度计算方法

编程接口


MXNet 背景介绍

MXNet诞生于卡耐基-梅隆大学，也在这里发展壮大，它已经成为我所见过的最具可扩展能力的深度学习框架，同时也是计算机科学领域辉煌成果的一种典型体现。多个不同学科在这里交汇并共同碰撞出火花：我们可以想象将线性代数引入大规模分布式计算，从而构建起一种全新的深度学习实现方案。我们对于Amazon投资支持MXNet的决定感到兴趣，并期待着看到MXNet能够走上新的高峰。

——Andrew Moore卡耐基-梅隆大学计算机科学系主任

https://github.com/dmlc/mxnet


支持MXNet的组织

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow

Dep Scheduler; for Python, R, Julia, Go, and more.


MXNet特点

• 灵活的编程模型

• 支持命令式（imperative）的网络创建方式，同时支持声明式（symbolic）的网络创建方式

• 可以支持从多种设备

• 可以支持CPU和GPU，支持云端的GPU实例，也支持移动端的小型设备

• 多语言支持

• 支持多种语言： C++, Python, R, Scala, Julia, Matlab, Javascript

• 天然的分布式计算

• 天然支持多CPU/GPU节点的计算

• 优化的性能

• 通过优化的c++引擎对IO和计算资源都进行并行处理

• 开放性

• Apache incubator project. http://incubator.apache.org/projects/mxnet.html

http://incubator.apache.org/projects/mxnet.html


Apache MXNet | The Basics

• NDArray: Manipulate multi-dimensional arrays in a command line

paradigm (imperative).

• Symbol: Symbolic expression for neural networks (declarative).

• Module: Intermediate-level and high-level interface for neural network

training and inference.

• Loading Data: Feeding data into training/inference programs.

• Mixed Programming: Training algorithms developed using NDArrays in

concert with Symbols.


>>> import mxnet as mx

>>> a = mx.nd.zeros((100, 50))

>>> b = mx.nd.ones((100, 50))

>>> c = a + b

>>> c += 1

>>> print(c)

• Straightforward and flexible.

• Take advantage of language native

features (loop, condition, debugger).

• E.g. Numpy, Matlab, Torch, …

• Hard to optimize

PROS

CONSEasy to tweak in Python

Imperative Programming


• More chances for optimization

• Cross different languages

• E.g. TensorFlow, Theano, Caffe

• Less flexible

PROS

CONSC can share memory with D because C is deleted later

A = Variable('A')

B = Variable('B')

C = B * A

D = C + 1f = compile(D)

d = f(A=np.ones(10),

B=np.ones(10)*2)

A B

1

+

X

Declarative Programming

IMPERATIVE NDARRAY API

DECLARATIVE SYMBOLIC EXECUTOR

>>> import mxnet as mx>>> a = mx.nd.zeros((100, 50))>>> b = mx.nd.ones((100, 50))>>> c = a + b>>> c += 1>>> print(c)

>>> import mxnet as mx>>> net = mx.symbol.Variable('data')>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)>>> net = mx.symbol.SoftmaxOutput(data=net)>>> texec = mx.module.Module(net)>>> texec.forward(data=c)>>> texec.backward()

NDArray can be set as input to the graph

Mixed Programming Paradigm

Embed symbolic expressions into imperative

programming

texec = mx.module.Module(net) for batch in train_data:

texec.forward(batch)texec.backward()

for param, grad in zip(texec.get_params(), texec.get_grads()):param -= 0.2 * grad

Mixed Programming Paradigm

• Fit the core library with all dependencies into a single C++ source file

• Easy to compile on any platform

多平台部署

BlindTool by Joseph Paul Cohen, demo on Nexus 4

RUNS IN BROWSER WITH JAVASCRIPT


MXNet的Jupyter支持

通过Jupyter呈现友好的交互界面


MXNet跨节点计算

天然支持跨GPU，跨节点计算


MXNet多GPU计算

直接通过--gpus 参数指定使用的GPU，或者在ctx中指定

通过 --kv-store 参数指定参数更新方法

local

device


MXNet跨节点计算

通过launch.py提交任务

host:

172.30.0.172

172.30.0.171

launch.py

launcher:ssh


MXNet在AWS上的使用


视频识别

GPU实例类型和深度学习应用场景

G2.xlarge

1 x K520

(1536 cuda , 4G)

G2.8xlarge

4 x GRID

(1536 cuda , 4G)

P2. xlarge

1 x K80

(4992 cuda , 24G)

P2. 8xlarge

8 x K80

(4992 cuda , 24G)

P2. 16xlarge

16 x K80

(4992 cuda , 24G)

Caffe/TensorFlow/MxNet

图像处理

图像识别

传统分类应用

自动驾驶语音识别、语音生成

自动翻译

AWS Deep Learning AMI 深度学习服务

Polly Lex Rekognition


GPUs: 快速显存访问

GPU

CoresGPU

RAM

PC I:~10 G B /s

CPU

240 G B /s

Ethernet2.5 G B /s


p2.16xlarge

GPU

CPU Ethernet2.5 G B /s

PCIx: ~10 GB/s

GPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU


深度学习镜像

预先安装:

• MXNet 和其它深度学习框架

• GPU 驱动, CUDA, cuDNN

• Jupyter notebook 和 python库


在AWS上安装MXNet

环境准备：G2,P2实例，Ubuntu 16.04, CUDA， CuDNN，

禁用nouveau

Clone mxnet, 配置config.mk, 开始编译

如果启用 S3 支持，需要安装libssl-dev


在Jupyter上使用MXNet

启动Jupyter:

jupyter notebook --port 9080 --ip '*' --no-browser


神经网络定义


神经网络训练


神经网络使用


快速构建分布式集群

$DEEPLEARNING_WORKERS_PATH: The file path that

contains the list of workers

$DEEPLEARNING_WORKERS_COUNT: The total number of

workers

$DEEPLEARNING_WORKER_GPU_COUNT: The number of

GPUs on the instance

$EFS_MOUNT: The directory where Amazon EFS is mounted

https://github.com/awslabs/deeplearning-cfn



训练模型

#run the CIFAR10 distributed training example

../../tools/launch.py -n $DEEPLEARNING_WORKERS_COUNT -H

$DEEPLEARNING_WORKERS_PATH\python train_cifar10.py --

gpus $(seq -s , 0 1 $(($DEEPLEARNING_WORKER_GPU_COUNT

- 1))) \ --network resnet --num-layers 50 --kv-store dist_device_sync

training for 100 epochs in 25 minutes on 2 P2.8x EC2 instances and

achieve a training accuracy of 92%


Deep Learning on AWS Batch

- 作为一项完全托管的服务，AWS Batch 使开发人员、科学家和工程师能够运行任何规模的批处理计算工作负载。

- 用户关注于构建自己的任务，比如：深度学习的模型训练任务.

• Jobs

• Job definitions

• Job queue

• Compute environments

• Scheduler

https://aws.amazon.com/blogs/compute/deep-learning-on-aws-batch/



1. 创建定制化的AMI. NVIDIA driver和Amazon ECS agent



2. 创建AWS Batch resources

git clone https://github.com/awslabs/aws-batch-helpers

cd aws-batch-helpers/gpu-example

python create-batch-entities.py \

--subnets <subnet1,subnet2,…> \

--security-groups <sg1,sg2,…> \

--key-pair <ec2-key-pair> \

--instance-role <instance-role> \

--image-id <custom-AMI-image-id> \

--service-role <service-role-arn>



3. 提交训练任务

# cd aws-batch-helpers/gpu-example

python submit-job.py –wait

Submitted job [train_imagenet - e1bccebc-76d9-4cd1-885b-667ef93eb1f5] to the job queue [gpu_queue] Job [train_imagenet - e1bccebc-76d9-4cd1-885b-

667ef93eb1f5] is RUNNING. Output [train_imagenet/e1bccebc-76d9-4cd1-885b-667ef93eb1f5/12030dd3-0734-42bf-a3d1-d99118b401eb]:

================================================================================ [2017-04-25T19:02:57.076Z] INFO:root:Epoch[0] Batch [100] Speed: 15554.63 samples/sec Train-accuracy=0.861077 [2017-04-25T19:02:57.428Z] INFO:root:Epoch[0] Batch [200] Speed: 18224.89 samples/sec Train-accuracy=0.954688 [2017-04-25T19:02:57.755Z] INFO:root:Epoch[0] Batch [300] Speed: 19551.42 samples/sec

[2017-04-25T19:02:59.713Z] INFO:root:Epoch[0] Batch [900] Speed: 19490.74 samples/sec Train-accuracy=0.979062 [2017-04-25T19:02:59.834Z]

INFO:root:Epoch[0] Train-accuracy=0.976774 [2017-04-25T19:02:59.834Z] INFO:root:Epoch[0] Time cost=3.190 [2017-04-25T19:02:59.850Z] INFO:root:Saved

checkpoint to "/mnt/model/mnist-0001.params" [2017-04-25T19:03:00.079Z] INFO:root:Epoch[0] Validation-accuracy=0.969148 ================================================================================ Job [train_imagenet - e1bccebc-76d9-4cd1-885b-

667ef93eb1f5] SUCCEEDED


Scale Predictions with AWS Lambda and MXNet

问题：Delivering model updates;

deploying globally;

maintaining high availability

解决方案：Lambda + MXNet

模型训练预测







Lambda function的部署和测试

#deploy

git clone https://github.com/awslabs/mxnet-lambda.git

cd mxnet-lambda

zip -9r lambda_function.zip *

aws lambda create-function --function-name mxnet-lambda --zip-file fileb://lambda_function.zip --runtime python2.7 --

role arn:aws:iam::<account-id>:role/lambda_basic_execution --handler lambda_function.lambda_handler

aws lambda update-function-code --function-name mxnet-lambda --zip-file fileb://lambda_function.zip

#test

aws lambda invoke --invocation-type RequestResponse --function-name mxnet-lambda --region us-east-1 --log-type Tail

--payload '{"url": "https://images-na.ssl-images-amazon.com/images/G/01/img15/pet-products/small-

tiles/23695_pets_vertical_store_dogs_small_tile_8._CB312176604_.jpg"}' output_file

#result

f45c89c335f7:src zhangjs$ cat output_file

{"body": "probability=0.450155, class=n02088364 beagleprobability=0.251278, class=n02089867 Walker hound, Walker

foxhoundprobability=0.128194, class=n02089973 English foxhoundprobability=0.022982, class=n02101388 Brittany

spanielprobability=0.017608, class=n02102177 Welsh springer spaniel\n", "headers": {"Access-Control-Allow-Origin": "*",

"content-type": "application/json"}, "statusCode": 200}


Thank You!

Documents

Apache MXNet快速构建深度学习平台Ÿº于Apache+MXNet快速构建深度...AWS中国（北京）区域由光环新网运营 MXNet 背景介绍 MXNet诞生于卡耐基-梅隆大学，也在这里发展壮大，它已经成为我所见过的最具