Using Tensor Swapping and NVLink to Overcome GPU Memory ... · Using Tensor Swapping and NVLink to...

Using Tensor Swapping and NVLink toOvercome GPU Memory Limits with TensorFlow

Sam Matzek

Deep learning is memory constrained

•GPUs have limited memory•Neural networks are growing deeper and wider•Amount and size of data to process is always growing

GPU Memory Usage

GPU memory

Tensors(Layer outputs)Input data

Kernels

Model Training in GPU Memory

Tensor 1 Tensor 2 Tensor 3

GPU memory

Model Training with Tensor Swapping

GPU memory

Tensor 3

System memory

Tensor 4Tensor 1Tensor 2

TensorFlow Large Model Support Graph ModificationsGPU CPU

Swap out

Swap in

https://arxiv.org/pdf/1807.02037.pdf

Enabling TensorFlow Large Model Support

* Examples for TFLMS v2.0.0

Keras API

Estimator API

What’s possible with Large Model Support?

•10x image resolution - Keras ResNet50

•10x image resolution - DeepLabV3 2D image segmentation

•5x MRI resolution - 3D U-Net 3D image segmentation

Measured with TFLMS v2.0.0 on TensorFlow 1.13, CUDA 10.1, cuDNN 7.5

3D U-Net image segmentation

•3D U-Net generally has high memory usage requirements•International Multimodal Brain Tumor Segmentation Challenge (BraTS)

•Existing Keras model with TensorFlow backend

Effect of 2x resolution on Dice Coefficients(higher is better)

“Swapping makes everything slow”

Typical GPU connectivity

System Memory

System memory bus: 76.8 GB/sPCIe: 32 GB/s

NVLink

POWER9 CPU to GPU connectivity

System Memory

System memory bus: 170 GB/s

NVLink 2.0: 150 GB/s

NVLink 2.0

NVLink 2.0: 150 GB/s

Effects of NVLink 2.0 on Large Model SupportPCIe connected GPU training one high res 3D MRI with large model support

NVLink 2.0 connected GPU training one high res 3D MRI with large model support

Effects of NVLink 2.0 on epoch times

Effects of NVLink 2.0 on GPU Utilization

Multi-GPU model training with NVLink 2.0

http://ibm.biz/3dunet-tflms-multigpu

2.1x faster with HALFthe number of GPUs !

Patches versus whole image

3xfaster

3.5xfaster

https://arxiv.org/abs/1812.07816

Overhead of Large Model Support with NVLink 2.0

Using bs=16, fine_tune_batch_norm=true, measured on 32GB GPU with TensorFlow 1.13, CUDA 10.1, cuDNN 7.5

1.2 GB transferred to GPU, GPU utilization 81%

LMS enabled

148 GB transferred to GPU,GPU utilization 90%

1.4 TB transferred to GPU,GPU utilization 64%

DeepLabV3 on POWER9 with 32GB NVIDIA Volta V100

Large Model Support with NVLink 2.0

• Tensor swapping can be used to overcome GPU memory limits

•Allows training of:

• deeper models

• higher resolution data

• larger batch sizes

•NVLink 2.0 between CPU and GPU allow tensor swapping with minimal overhead

More informationTensorFlow Large Model Supporthttps://github.com/IBM/tensorflow-large-model-support

TFLMS: Large Model Support in TensorFlow by Graph Rewriting https://arxiv.org/pdf/1807.02037.pdf

TensorFlow Large Model Support Case Studyhttps://developer.ibm.com/linuxonpower/2018/07/27/tensorflow-large-model-support-case-study-3d-image-segmentation/

Performance of 3DUnet Multi GPU Model for Medical Image Segmentation using TensorFlow Large Model Supporthttp://ibm.biz/3dunet-tflms-multigpu

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Methodhttps://arxiv.org/abs/1812.07816

Data-parallel distributed training of very large models beyond GPU capacityhttps://arxiv.org/abs/1811.12174

POWER9 server with NVLink 2.0 connections between CPU and GPU (IBM AC922):https://www.ibm.com/us-en/marketplace/power-systems-ac922

Using Tensor Swapping and NVLink to Overcome GPU Memory ... · Using Tensor Swapping and NVLink to...

Documents

UNIT IV MEMORY MANAGEMENT - WordPress.com · UNIT IV MEMORY MANAGEMENT Storage Management: Swapping –Contiguous Memory allocation –Paging –Segmentation –Segmentation with

Memory Management - uio.no · 1! 1 Memory Management • Basic memory management: • Mono- and multi-programming • Fixed and variable memory partitioning • Swapping • Paging

1999 1 Memory Management Background Logical versus Physical Address Space Swapping Contiguous Allocation Virtual memory Management Paging –Background –Implementation

Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The

Operating System Principles: Memory Management – …lasr.cs.ucla.edu/classes/111_fall16/slides/Lecture_6.pdfOperating System Principles: Memory Management – Swapping, Paging, and

Unit - III Memory management and Virtual memory. Contents : Swapping Demand paging Hybrid System with swapping and demand paging Memory management requirements

Optimizing MySQL Configuration - Percona...Do not let MySQL Swap • Allocating too much memory and having MySQL • swapping is a lot worse than not using all memory • Monitor swapping

PASCAL GPU WITH NVLINK - hotchips.orghotchips.org/wp-content/uploads/hc_archives/hc28/...August 2016 PASCAL GPU WITH NVLINK . 2 NVLink PCIe Switch PCIe Switch CPU CPU OUTLINE P100

Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation

Memory Management Today Basic memory management Swapping Kernel memory allocation Next Time Virtual memory

1 Memory Management 4.1 Basic memory management 4.2 Swapping 4.3 Virtual memory 4.4 Page replacement algorithms 4.5 Modeling page replacement algorithms

Memory Management - cgi.cse.unsw.edu.aucgi.cse.unsw.edu.au/~cs3231/18s1/lectures/lect13.pdf · Monoprogramming without Swapping or Paging Three simple ways of organizing memory -

Chapter 8: Memory Managementlacher/courses/COP4610/lectures_8e/ch08.pdf · Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page

SwapAdvisor: Push Deep Learning Beyond the GPU Memory ...news.cs.nyu.edu/~jinyang/pub/swapadvisor-asplos20.pdf · Swapping for DNN computation differs from traditional swapping (between

XE33OSA Chapter 8: Memory Management. 8.2XE33OSA Silberschatz, Galvin and Gagne ©2005 Chapter 8: Memory Management Background Swapping Contiguous Allocation

Lecture 8 Memory Management Strategies (chapter 8)€¦ · Dr. İbrahim Körpeoğlu ... • Background • Swapping • Contiguous Memory Allocation • Paging • Structure of the

Memory management Basic memory management Swapping Virtual memory Page replacement algorithms Modeling page replacement algorithms Design issues for paging

1 Outline Basic memory management Swapping Virtual memory Page replacement algorithms Modeling page replacement algorithms Design issues for paging systems

Swapping - AndroBenchcsl.skku.edu/uploads/SWE3004S17/9-swapping.pdf · Swapping Support processes when not enough physical memory •User program should be independent of the amount

1 Memory Management Chapter 4 4.1 Basic memory management 4.2 Swapping 4.3 Virtual memory / Paging 4.4 Page replacement algorithms 4.5 Modeling page replacement