Upload
others
View
39
Download
0
Embed Size (px)
Citation preview
AWS中国(北京)区域由光环新网运营
基于Apache MXNet快速构建深度学习平台
张江山,AWS解决方案架构师
2017年5月23日
AWS中国(北京)区域由光环新网运营
议程
MXNet介绍
MXNet在AWS上的使用
AWS中国(北京)区域由光环新网运营
深度学习的模型和应用
0.2
-0.1
...
0.7
Input Output
1 1 1
1 0 1
0 0 03
mx.sym.Pooling(data, pool_type="max", kernel=(2,2), stride=(2,2)
lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed)
4 2
2 0 4=Max
1
3
...
4
0.2
-0.1
...
0.7
mx.sym.FullyConnected(data, num_hidden=128)
2
mx.symbol.Embedding(data, input_dim, output_dim = k)
Queen
4 2
2 0 2=Avg
Input Weights
cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman)
mx.sym.Activation(data, act_type="xxxx")
"relu"
"tanh"
"sigmoid"
"softrelu"
Neural Art
Face Search
Image Segmentation
Image Caption
“People Riding Bikes”
Bicycle, People,
Road, Sport
Image Labels
Image
Video
Speech
Text
“People Riding Bikes”
Machine Translation
“Οι άνθρωποι
ιππασίας ποδήλατα”
Events
mx.model.FeedForward model.fit
mx.sym.SoftmaxOutput
Anatomy of a Deep Learning Model
mx.sym.Convolution(data, kernel=(5,5), num_filter=20)
AWS中国(北京)区域由光环新网运营
MXNet介绍
AWS中国(北京)区域由光环新网运营
网络生成方法
梯度计算方法
编程接口
AWS中国(北京)区域由光环新网运营
MXNet 背景介绍
MXNet诞生于卡耐基-梅隆大学,也在这里发展壮大,它已经成为我所见过的最具可扩展能力的深度学习框架,同时也是计算机科学领域辉煌成果的一种典型体现。多个不同学科在这里交汇并共同碰撞出火花:我们可以想象将线性代数引入大规模分布式计算,从而构建起一种全新的深度学习实现方案。我们对于Amazon投资支持MXNet的决定感到兴趣,并期待着看到MXNet能够走上新的高峰。
——Andrew Moore卡耐基-梅隆大学计算机科学系主任
https://github.com/dmlc/mxnet
AWS中国(北京)区域由光环新网运营
支持MXNet的组织
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow
Dep Scheduler; for Python, R, Julia, Go, and more.
AWS中国(北京)区域由光环新网运营
MXNet特点
• 灵活的编程模型
• 支持命令式(imperative)的网络创建方式,同时支持声明式(symbolic)的网络创建方式
• 可以支持从多种设备
• 可以支持CPU和GPU,支持云端的GPU实例,也支持移动端的小型设备
• 多语言支持
• 支持多种语言: C++, Python, R, Scala, Julia, Matlab, Javascript
• 天然的分布式计算
• 天然支持多CPU/GPU节点的计算
• 优化的性能
• 通过优化的c++引擎对IO和计算资源都进行并行处理
• 开放性
• Apache incubator project. http://incubator.apache.org/projects/mxnet.html
AWS中国(北京)区域由光环新网运营
Apache MXNet | The Basics
• NDArray: Manipulate multi-dimensional arrays in a command line
paradigm (imperative).
• Symbol: Symbolic expression for neural networks (declarative).
• Module: Intermediate-level and high-level interface for neural network
training and inference.
• Loading Data: Feeding data into training/inference programs.
• Mixed Programming: Training algorithms developed using NDArrays in
concert with Symbols.
AWS中国(北京)区域由光环新网运营
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
• Straightforward and flexible.
• Take advantage of language native
features (loop, condition, debugger).
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONSEasy to tweak in Python
Imperative Programming
AWS中国(北京)区域由光环新网运营
• More chances for optimization
• Cross different languages
• E.g. TensorFlow, Theano, Caffe
• Less flexible
PROS
CONSC can share memory with D because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programming
IMPERATIVE NDARRAY API
DECLARATIVE SYMBOLIC EXECUTOR
>>> import mxnet as mx>>> a = mx.nd.zeros((100, 50))>>> b = mx.nd.ones((100, 50))>>> c = a + b>>> c += 1>>> print(c)
>>> import mxnet as mx>>> net = mx.symbol.Variable('data')>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)>>> net = mx.symbol.SoftmaxOutput(data=net)>>> texec = mx.module.Module(net)>>> texec.forward(data=c)>>> texec.backward()
NDArray can be set as input to the graph
Mixed Programming Paradigm
Embed symbolic expressions into imperative
programming
texec = mx.module.Module(net) for batch in train_data:
texec.forward(batch)texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):param -= 0.2 * grad
Mixed Programming Paradigm
• Fit the core library with all dependencies into a single C++ source file
• Easy to compile on any platform
多平台部署
BlindTool by Joseph Paul Cohen, demo on Nexus 4
RUNS IN BROWSER WITH JAVASCRIPT
AWS中国(北京)区域由光环新网运营
MXNet的Jupyter支持
通过Jupyter呈现友好的交互界面
AWS中国(北京)区域由光环新网运营
MXNet跨节点计算
天然支持跨GPU,跨节点计算
AWS中国(北京)区域由光环新网运营
MXNet多GPU计算
直接通过--gpus 参数指定使用的GPU,或者在ctx中指定
通过 --kv-store 参数指定参数更新方法
local
device
AWS中国(北京)区域由光环新网运营
MXNet跨节点计算
通过launch.py提交任务
host:
172.30.0.172
172.30.0.171
launch.py
launcher:ssh
AWS中国(北京)区域由光环新网运营
MXNet在AWS上的使用
AWS中国(北京)区域由光环新网运营
视频识别
GPU实例类型和深度学习应用场景
G2.xlarge
1 x K520
(1536 cuda , 4G)
G2.8xlarge
4 x GRID
(1536 cuda , 4G)
P2. xlarge
1 x K80
(4992 cuda , 24G)
P2. 8xlarge
8 x K80
(4992 cuda , 24G)
P2. 16xlarge
16 x K80
(4992 cuda , 24G)
Caffe/TensorFlow/MxNet
图像处理
图像识别
传统分类应用
自动驾驶语音识别、语音生成
自动翻译
AWS Deep Learning AMI 深度学习服务
Polly Lex Rekognition
AWS中国(北京)区域由光环新网运营
GPUs: 快速显存访问
GPU
CoresGPU
RAM
PC I:~10 G B /s
CPU
240 G B /s
Ethernet2.5 G B /s
AWS中国(北京)区域由光环新网运营
p2.16xlarge
GPU
CPU Ethernet2.5 G B /s
PCIx: ~10 GB/s
GPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU GPUGPU
AWS中国(北京)区域由光环新网运营
深度学习镜像
预先安装:
• MXNet 和其它深度学习框架
• GPU 驱动, CUDA, cuDNN
• Jupyter notebook 和 python库
AWS中国(北京)区域由光环新网运营
在AWS上安装MXNet
环境准备:G2,P2实例,Ubuntu 16.04, CUDA, CuDNN,
禁用nouveau
Clone mxnet, 配置config.mk, 开始编译
如果启用 S3 支持,需要安装libssl-dev
AWS中国(北京)区域由光环新网运营
在Jupyter上使用MXNet
启动Jupyter:
jupyter notebook --port 9080 --ip '*' --no-browser
AWS中国(北京)区域由光环新网运营
神经网络定义
AWS中国(北京)区域由光环新网运营
神经网络训练
AWS中国(北京)区域由光环新网运营
神经网络使用
AWS中国(北京)区域由光环新网运营
快速构建分布式集群
$DEEPLEARNING_WORKERS_PATH: The file path that
contains the list of workers
$DEEPLEARNING_WORKERS_COUNT: The total number of
workers
$DEEPLEARNING_WORKER_GPU_COUNT: The number of
GPUs on the instance
$EFS_MOUNT: The directory where Amazon EFS is mounted
https://github.com/awslabs/deeplearning-cfn
AWS中国(北京)区域由光环新网运营
AWS中国(北京)区域由光环新网运营
训练模型
#run the CIFAR10 distributed training example
../../tools/launch.py -n $DEEPLEARNING_WORKERS_COUNT -H
$DEEPLEARNING_WORKERS_PATH\python train_cifar10.py --
gpus $(seq -s , 0 1 $(($DEEPLEARNING_WORKER_GPU_COUNT
- 1))) \ --network resnet --num-layers 50 --kv-store dist_device_sync
training for 100 epochs in 25 minutes on 2 P2.8x EC2 instances and
achieve a training accuracy of 92%
AWS中国(北京)区域由光环新网运营
Deep Learning on AWS Batch
- 作为一项完全托管的服务,AWS Batch 使开发人员、科学家和工程师能够运行任何规模的批处理计算工作负载。
- 用户关注于构建自己的任务,比如:深度学习的模型训练任务.
• Jobs
• Job definitions
• Job queue
• Compute environments
• Scheduler
https://aws.amazon.com/blogs/compute/deep-learning-on-aws-batch/
AWS中国(北京)区域由光环新网运营
Deep Learning on AWS Batch
1. 创建定制化的AMI. NVIDIA driver和Amazon ECS agent
AWS中国(北京)区域由光环新网运营
Deep Learning on AWS Batch
2. 创建AWS Batch resources
git clone https://github.com/awslabs/aws-batch-helpers
cd aws-batch-helpers/gpu-example
python create-batch-entities.py \
--subnets <subnet1,subnet2,…> \
--security-groups <sg1,sg2,…> \
--key-pair <ec2-key-pair> \
--instance-role <instance-role> \
--image-id <custom-AMI-image-id> \
--service-role <service-role-arn>
AWS中国(北京)区域由光环新网运营
Deep Learning on AWS Batch
3. 提交训练任务
# cd aws-batch-helpers/gpu-example
python submit-job.py –wait
Submitted job [train_imagenet - e1bccebc-76d9-4cd1-885b-667ef93eb1f5] to the job queue [gpu_queue] Job [train_imagenet - e1bccebc-76d9-4cd1-885b-
667ef93eb1f5] is RUNNING. Output [train_imagenet/e1bccebc-76d9-4cd1-885b-667ef93eb1f5/12030dd3-0734-42bf-a3d1-d99118b401eb]:
================================================================================ [2017-04-25T19:02:57.076Z] INFO:root:Epoch[0] Batch [100] Speed: 15554.63 samples/sec Train-accuracy=0.861077 [2017-04-25T19:02:57.428Z] INFO:root:Epoch[0] Batch [200] Speed: 18224.89 samples/sec Train-accuracy=0.954688 [2017-04-25T19:02:57.755Z] INFO:root:Epoch[0] Batch [300] Speed: 19551.42 samples/sec
[2017-04-25T19:02:59.713Z] INFO:root:Epoch[0] Batch [900] Speed: 19490.74 samples/sec Train-accuracy=0.979062 [2017-04-25T19:02:59.834Z]
INFO:root:Epoch[0] Train-accuracy=0.976774 [2017-04-25T19:02:59.834Z] INFO:root:Epoch[0] Time cost=3.190 [2017-04-25T19:02:59.850Z] INFO:root:Saved
checkpoint to "/mnt/model/mnist-0001.params" [2017-04-25T19:03:00.079Z] INFO:root:Epoch[0] Validation-accuracy=0.969148 ================================================================================ Job [train_imagenet - e1bccebc-76d9-4cd1-885b-
667ef93eb1f5] SUCCEEDED
AWS中国(北京)区域由光环新网运营
Scale Predictions with AWS Lambda and MXNet
问题:Delivering model updates;
deploying globally;
maintaining high availability
解决方案:Lambda + MXNet
模型训练 预测
AWS中国(北京)区域由光环新网运营
Scale Predictions with AWS Lambda and MXNet
AWS中国(北京)区域由光环新网运营
Scale Predictions with AWS Lambda and MXNet
AWS中国(北京)区域由光环新网运营
Scale Predictions with AWS Lambda and MXNet
Lambda function的部署和测试
#deploy
git clone https://github.com/awslabs/mxnet-lambda.git
cd mxnet-lambda
zip -9r lambda_function.zip *
aws lambda create-function --function-name mxnet-lambda --zip-file fileb://lambda_function.zip --runtime python2.7 --
role arn:aws:iam::<account-id>:role/lambda_basic_execution --handler lambda_function.lambda_handler
aws lambda update-function-code --function-name mxnet-lambda --zip-file fileb://lambda_function.zip
#test
aws lambda invoke --invocation-type RequestResponse --function-name mxnet-lambda --region us-east-1 --log-type Tail
--payload '{"url": "https://images-na.ssl-images-amazon.com/images/G/01/img15/pet-products/small-
tiles/23695_pets_vertical_store_dogs_small_tile_8._CB312176604_.jpg"}' output_file
#result
f45c89c335f7:src zhangjs$ cat output_file
{"body": "probability=0.450155, class=n02088364 beagleprobability=0.251278, class=n02089867 Walker hound, Walker
foxhoundprobability=0.128194, class=n02089973 English foxhoundprobability=0.022982, class=n02101388 Brittany
spanielprobability=0.017608, class=n02102177 Welsh springer spaniel\n", "headers": {"Access-Control-Allow-Origin": "*",
"content-type": "application/json"}, "statusCode": 200}
AWS中国(北京)区域由光环新网运营
Thank You!