PipeSwitch: Fast Pipelined Context Switching for Deep ... · Zhihao Bai, Zhen Zhang, Yibo Zhu, Xin...

PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications

Zhihao Bai, Zhen Zhang, Yibo Zhu, Xin Jin

Deep learning powers intelligent applications in many domains

Training and inference

Training Inference

High throughput Low latency

GPUs clusters for DL workloads

Separate clusters for training and inference

Cluster for training

Cluster for inference

Utilization of GPU clusters is low

Training

Inference

Daytime Midnight

Today: separate clusters Ideal: shared clusters

Daytime Midnight

Context switching overhead is high

New model

Old model

Context switching overhead is high

InferResNet

TrainBERT

NVIDIA T4

Latency: 6s

Drawbacks of existing solutions

Latency: 6s

• NVIDIA MPS• High overhead due to contention

• Salus[MLSys’20]• Requires all the models to be preloaded into the GPU memory

Goal: fast context switching

Latency: 6s

• Enable GPU-efficient multiplexing of multiple DL apps with fine-grained time-sharing

• Achieve millisecond-scale context switching latencies and high throughput

PipeSwitch overview: architecture

Controller

ActiveWorker

MemoryDaemon

StandbyWorker

NewTask

PipeSwitch overview: execution

• Stop the current task and prepare for the next task.

• Execute the task with pipelined model transmission.

• Clean the environment for the previous task.

Controller

ActiveWorker

MemoryDaemon

StandbyWorker

NewTask

Sources of context switching overhead

Task cleaning

Task initialization

Memory allocation

Model transmission

How to reduce the overhead?

Pipelinedmodel transmission

Model transmission

DL models have layered structures

Layer-1

Layer-2

Layer-N

Output

ForwardPropagation

BackwardPropagation

Sequential model transmission and execution

model transmissionover PCIe

task executionon GPU

T1 Tn-1 E1 En-1T2 E2

Transmit layer 0 Execute layer 0

Pipelined model transmission and execution

T0 T1 Tn-1T2

E0 E1 En-1E2

Transmit layer 0

T0 T1 Tn-1T2

E0 E1 En-1E2

Transmit layer 1

Execute layer 0

T0 T1 Tn-1T2

E0 E1 En-1E2

Transmit layer 2

Execute layer 1

T0 T1 Tn-1T2

E0 E1 En-1E2

1.Multiple calls to PCIe;2.Synchronize transmission and execution.

GPU Group(0, i)

Group(i+1, j)

Group(0, i)

Group(i+1, j)

Group(k, n-1)

• Exponential time to find the optimal strategy• Two heuristics for pruning

Unifiedmemory management

Task cleaning

Task initialization

Memory allocation

Model transmission

Unified memory management

GPU memory

MemoryDaemon

WorkersPointer

Offset

Manage model parameters.Allocate GPU memory.

Active-standbyworker switching

Task cleaning

Task initialization

Memory allocation

Model transmission

Active-standby worker switching

Old Task

New Task

Init. Execute

Init. Execute Clean

New Task Starts

Old Task

New Task

Init. Execute

New Task Starts

Init.2

Init.1

Execute Clean

Old Task

New Task

Init. Execute

New Task Starts

Init.2

Init.1

Execute Clean

Launch the process.Create CUDA context.

Allocate GPU memory.

Old Task

New Task

Init. Execute

New Task Starts

Init.2

Init.1

Execute Clean

Implementation

• Testbed: AWS EC2• p3.2xlarge: PCIe 3.0x16, NVIDIA Tesla V100 GPU

• g4dn.2xlarge: PCIe 3.0x8, NVIDIA Tesla T4 GPU

• Software• CUDA 10.1

• PyTorch 1.3.0

• Models• ResNet-152

• Inception-v3

• BERT-base

Evaluation

• Can PipeSwitch satisfy SLOs?

• Can PipeSwitch provide high utilization?

• How well do the design choices of PipeSwitch work?

Evaluation

• Can PipeSwitch satisfy SLOs?

• Can PipeSwitch provide high utilization?

PipeSwitch satisfies SLOs

NVIDIA Tesla V100 NVIDIA Tesla T4

PipeSwitch achieves low context switching latency.

PipeSwitch provide high utilization

Scheduling cycles

PipeSwitch achieves near 100% utilization.

Summary

• GPU clusters for DL applications suffer from low utilization• Limited share between training and inference workloads

• PipeSwitch introduces pipelined context switching• Enable GPU-efficient multiplexing of DL apps with fine-grained time-sharing

• Achieve millisecond-scale context switching latencies and high throughput

Thank you!

zbai1@jhu.edu

PipeSwitch: Fast Pipelined Context Switching for Deep ... · Zhihao Bai, Zhen Zhang, Yibo Zhu, Xin...

Documents

Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

Zhihao Jiang, Rahul Mangharam PRECISE Center University of Pennsylvania

The Convergence Analysis of Regional Growth Differences …file.scirp.org/pdf/JSSM_2016111016340272.pdf · Linhai Mei, Zhihao Chen# Department of Economics, School of Economics, Jinan

Peptide macrocyclization through amide-to-amide transpeptidation Peptide macrocycl… · Peptide macrocyclization through amide-to-amide transpeptidation Xinya Hemu, Yibo Qiu, James

Analysing Zombie accounts in Weiboyjc32/project/Thesis - YiBo/Yibo Liang.pdf · 2 Declaration I, Yibo Liang confirm that this work submitted for assessment is my own and is expressed

Zhihao (Scott) Chen chen@motorola.com. Dynamic Service Orchestration Business Rule Processing Business Intelligence Analytics Context-Aware Computing

DREAMPlace: Deep Learning Toolkit-Enabled GPU …DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement Yibo Lin ECE Department, UT Austin yibolin@utexas.edu

Generating and Exploiting Probabilistic Monocular Depth ......Generating and Exploiting Probabilistic Monocular Depth Estimates Zhihao Xia1, Patrick Sullivan2, Ayan Chakrabarti1 1Washington

Readiness in Formation Control of Multi-Robot System Zhihao Xu (Univ. of Würzburg) Hiroaki Kawashima (Kyoto Univ.) Klaus Schilling (Univ. of Würzburg)

City Issues ( Transportation) By: Candice, Yibo , Hansan , and Brian

WheelsPlusWings.com YiBo awesome update 2015 #ItAllStartsHere

Detecting Zhihao Ai Market - Columbia University

Zhihao Wang1,2 2 1,3 1,2 1,3,* - bioRxiv.org...2020/01/30 · common finding) and that the anxiety-predictive model also predicted depression. .

Online Algorithms for Weighted Paging with Predictionsdebmalya/papers/icalp20-paging.pdf · 1 OnlineAlgorithmsforWeightedPagingwith 2 Predictions Zhihao Jiang1 3 4 TsinghuaUniversity,China

Airlines Industry Group Presentation Group member: Kargan Taylor Zhihao Wang Kenneth Wu Meng Wu Ruoyi Zhao

, Longfei Xiao , Yibo Liang , Longbin Tao · 1 1 Experimental and numerical studies of the pontoon effect on vortex-induced motions of 2 deep-draft semi-submersibles 3 Mingyue Liua,b,

Pixel super-resolution using wavelength scanninginnovate.ee.ucla.edu/wp-content/uploads/2015/12/Ozcan-Group... · Pixel super-resolution using wavelength scanning Wei Luo1,2,3, Yibo

Econs project- Ipad By: Avina, Deon, Fiona, Tiance, Xinru, Zhihao

A Survey on Optimistic Concurrency Control CAI Yibo 1010087850 ZHENG Xin 1010121850

High bond difference parameter-induced low thermal ...faculty.missouri.edu/zhangyu/Pubs/307_c9cp01029g.pdf · sp2 and sp3 hybridization† Zhihao Feng,ab Huicong Dong, *ab Shenghong