25
Industrial Level Deep Learning Training Infrastructure the Practice and Experience from SenseTime Shengen Yan SenseTime Group Limited.

Industrial Level Deep Learning Training Infrastructure

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Industrial Level Deep Learning Training Infrastructure

Industrial Level Deep Learning Training Infrastructure—the Practice and Experience from SenseTime

Shengen Yan

SenseTime Group Limited.

Page 2: Industrial Level Deep Learning Training Infrastructure

The Success of Deep Learning

2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 2015-01 2016-01

Google Search

AlexNet won ImageNet

Page 3: Industrial Level Deep Learning Training Infrastructure

What Lead to the Success?

Page 4: Industrial Level Deep Learning Training Infrastructure

Model CapacityThe Key to High Performance

5 8 22

169

1207

LeNet AlexNet (2012) GoogLeNet (2014) ResNet (2016) Ours

# Layers

Page 5: Industrial Level Deep Learning Training Infrastructure

Computation power

Years months weeks days

Accelerate the training time from several years to several days!

Page 6: Industrial Level Deep Learning Training Infrastructure

Deep Learning PackageA deep learning framework that is efficient, scalable, and flexible.

DeepLinkA large-scale cluster platform designed for deep learning.

ApplicationsDelivers many application models

01

02

03

Page 7: Industrial Level Deep Learning Training Infrastructure

Deep Learning is Complicated

Deep Learning community developedframeworks to make the life easier.

GoogleNet (2014)

Page 8: Industrial Level Deep Learning Training Infrastructure

Deep learning Training Frameworks

‣SenseTime Deep Learning training Package

• Memory efficient

• Computation efficient

• Both model parallel & data parallel

• Support huge model

• Scalability

Page 9: Industrial Level Deep Learning Training Infrastructure

Memory Footprint Optimization

high level compiler backend optimization algorithms on intermediate representation.

Optimizations: liveness analysis, computation graph

Page 10: Industrial Level Deep Learning Training Infrastructure

Seeing

Perceiving

Generated Graph with mirror(re-compute) node

Chen T, Xu B, Zhang C, et al. Training deep nets with sublinear memory cost[J]. arXiv preprint arXiv:1604.06174, 2016.

Memory Footprint Optimization

Page 11: Industrial Level Deep Learning Training Infrastructure

Model Capacity

Memory usage efficiency, higher is better

0

20

40

60

80

100

120

140

VGG ResNet50 ResNet152 Inception V4 ResNet269 Inception ResNet

Ours MxNet TensorFlow Chainer Caffe Torch

Page 12: Industrial Level Deep Learning Training Infrastructure

Single-GPU Performance

Batch-32 Batch-64 Batch-128Caffe 497.5 1045 1965Chainer 200 290 543TensorFlow 178.6 315.7 587.2Parrots 122.7 225.6 471

0

500

1000

1500

2000

2500

milliseconds / iteration

Caffe Chainer TensorFlow Parrots

Page 13: Industrial Level Deep Learning Training Infrastructure

Communication Optimization

Support Multi-GPUs and Multi-Nodes

Three procedures: Copy, Allreduce, Copy

Optimizations:

• Master-slave threads to overlap the communication and computation overhead

• GPU direct communication

• Ring allreduce message passing

GPU0 GPU1 GPU3GPU2

CPU Memory

Other NodesAllreduce

CopyCopy

Page 14: Industrial Level Deep Learning Training Infrastructure

Scalability

0

0.2

0.4

0.6

0.8

1

1.2

0

2000

4000

6000

8000

10000

12000

1 2 3 4 8 16 24 32

# GPUs

millisec/iter scale efficiency

single node multiple nodes

Page 15: Industrial Level Deep Learning Training Infrastructure

Deep Learning PackageA deep learning framework that is efficient, scalable, and flexible.

DeepLinkA large-scale cluster platform designed for deep learning.

ApplicationsDelivers many application models

01

02

03

Page 16: Industrial Level Deep Learning Training Infrastructure

The role of supercomputer

It just like highway in the city

— It is a key infrastructure of AI

Page 17: Industrial Level Deep Learning Training Infrastructure

Supercomputing Centers for AIThe key infrastructures for AI research.

DATA

COMPPUT-

ATIONMODEL

DeepLink

Page 18: Industrial Level Deep Learning Training Infrastructure

Challenges

‣ Interconnects at multiple levels

• GPUs, Nodes, Sub-networks

‣Distributed data

• Random access becomes particularly difficult

‣Scale vs. Stability

• Failures of individual nodes/links

‣Human resources

• Engineers who understand both Deep Learning & HPC are difficult to come by

Page 19: Industrial Level Deep Learning Training Infrastructure

DeepLink ClustersDesigned for Deep Learning

Software

Hardware

Co-design

High-

performance

Hardware

Customized

Middlewares

Maximize respective strengths while ensuring optimal cooperation.

• High speed interconnects

• High performance GPU computing

• Efficient distributed storage

• Distributed storage & cache system (optimized for small files)

• Distributed deep learning framework

• Task scheduling & monitoring

Page 20: Industrial Level Deep Learning Training Infrastructure

Platform overview

Heterogeneous deep learning super computer

High speed storage system

Operation/Maintenance/Monitoring System

Lightweight virtualization

Task scheduling system

Distributed training software

Deep Learning Training Visualization System

Customized communication library for deep learning

Computation library

Distributed cache system

Softw

arePlatfo

rm

Page 21: Industrial Level Deep Learning Training Infrastructure

Training Visualization

Page 22: Industrial Level Deep Learning Training Infrastructure

DeepLink in SenseTime

>3000 GPUs

Page 23: Industrial Level Deep Learning Training Infrastructure

Deep Learning PackageA deep learning framework that is efficient, scalable, and flexible.

DeepLinkA large-scale cluster platform designed for deep learning.

ApplicationsDelivers many application models

01

02

03

Page 24: Industrial Level Deep Learning Training Infrastructure
Page 25: Industrial Level Deep Learning Training Infrastructure

THANK YOU