AI & GPU ACCELERATED COMPUTING IN GOVERNMENT · AI & GPU ACCELERATED COMPUTING IN...

Larry Brown Ph.D.

AI & GPU ACCELERATED COMPUTING IN GOVERNMENT

Federal Solutions Architecture Manager

> Founded in 1993

> Jensen Huang, Founder & CEO

> 11,000 employees

> $123B market cap; $6.9B revenue in FY17

“World’s Most Admired Companies”— Fortune

“50 Smartest Companies: #1”— MIT Tech Review

“#3 Top CEO in the World”— Harvard Business Review

“Most Innovative Companies”— Fast Company

NVIDIA

PERVASIVE USE OF GPUS IN GOVERNMENTGeospatial | Intelligent Data & Video Analytics | Health & Human Services | Research

AI & DL | Data Analytics | High Performance Compute | Virtual Desktop Integration

THE ERA OF AIThe PC revolution put a computer

in every home. The mobile era put

a computer in every pocket, and then

the cloud turned every mobile device

into a supercomputer. The AI era will

infuse intelligence into trillions of

computing devices and be the single

largest opportunity the industry has

ever known. AI will spur a wave of

social progress unmatched since

the industrial revolution.

MOBILE

THE BRAIN OF AI CARSAutonomous vehicles will modernize the

$10 trillion transportation industry —

making our roads safer and our cities

more efficient. NVIDIA DRIVE™ is a

scalable AI car platform that spans the

entire range of autonomous driving,

from traffic-jam pilots to robotaxis.

More than 225 companies have adopted

DRIVE, using NVIDIA AI technology in

their data centers and vehicles. They

range from car companies and suppliers,

to mapping and sensor companies, to

startups and research organizations.

AI, ML AND DL

DEEP LEARNING USE CASES IN FEDERAL

A 21ST CENTURYAI PLANNING TOOL

Understand distribution of 7B+ people for planning infra. & vital services

Process high-res imagery in minutes to map urban dynamics and information critical for emergency response

Urban Dynamics Institute

USING AI TO MONITOR EARTH’S VITALS

Record increases in global temperature, glacial retreat and rising sea levels

Using satellite imagery to measure the effects of carbon and greenhouse gas emissions

Help scientists to plan to protect ecosystems and farmers improve crop production

Ames Research Center DeepSat

MAPPING AN END TO POVERTY

CHALLENGE

1B people live on less than $2 a day - ending poverty tops the list of the UN’s sustainable development goals

Researching poverty isexpensive, manual, dangerous and does not scale

Collecting Data on Global Poverty

SOLUTION

Big data and applied deep

learning to show the

distribution of wealth across

the globe

'Earth at night’ images

determine useful features to

indicate economic activity

(such as city lights)

IMPACT

Scalable, safer and less-expensive way to collect data on poverty

Poverty maps will help world organizations locate those most in need of relief

DL FOR REMOTE SENSING

Deep Learning for Accelerating Remote Sensing Feature Extraction, GTC – Washington, DC, October, 2016

MEGA – Machine Learning for GEOINT Analytics

Automated Target Recognition

Real-time Ground Weather Intel

Activity Based Intelligence

DEEP LEARNING ON RADARDetect and recognize mobile ground objects from airborne platform

Target Recognition and Adaptation in Contested Environments

Speed & Accuracy are both critical.

Trained on TitanX - deployed on Jetson.

Image Manifold Translation in Synthetic Aperture Radar Imaging, GTC – Washington, DC, October, 2016

ANALYTICS PROTECTS & PREVENTS DELAYS

Tracks 150+ billion pieces of mail annually

Achieves near-immediate analysis of data from 213,000+ scanning devices

Reduced driving by 70m miles — 7m gallons of fuel, 70,000 tons of carbon

AI TO ACCELERATE CANCER RESEARCH

CANDLE is a common discovery platform, with the goal of achieving 10X annual increases in productivity for cancer researchers.

Unites the Department of Energy, National Cancer Institute with researchers at Oak Ridge, Lawrence Livermore, Argonne, and Los Alamos National Laboratories

DEEP LEARNING FOR CYBER SECURITYWant more proactive approach than relying on signatures

University of Maryland

Malware Detection by Eating a Whole EXE, GTC – Washington, DC, November 2017

Malconv results

GPU – THE COMPUTE PLATFORM FOR AI

NEURAL NETWORK COMPLEXITY IS EXPLODINGLarge, More Compute Intensive

“It’s time to start planning for the end of Moore’s Law, and it’s worth pondering how it will end, not just when.”

Robert Colwell Retired Director, Microsystems Technology Office, DARPA

GPU Accelerator

1980 1990 2000 2010 2020

GPU-Computing perf

1.5X per year1000X

RISE OF GPU COMPUTING

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K.

Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp

Single-threaded perf

1.5X per year

1.1X per year

10x-100x application performance gains

5x energy efficiency & low footprint

Next generation in-memory processing

PLATFORM BUILT FOR AIAccelerating Every Framework And Fueling Innovation

All Major FrameworksAll Use-cases

Speech Video

Translation Personalization

Volta Tensor Core, NVSwitch, NVLink

Tensor Cores

NVLink NVSwitch

ARCHITECTING MODERN DATACENTERS

Strong Core CPU for Sequential code

Volta 5,120 CUDA Cores

125 TFLOPS Tensor Core

NVLink for Strong Scaling

ARCHITECTING MODERN DATACENTERS

Training

Device

GPU DEEP LEARNING IS A NEW COMPUTING MODEL

Training

Billions of Trillions of Operations

GPU train larger models, accelerate

time to marketInferencing

Datacenter inferencing

10s of billions of image, voice, video

queries per day

GPU inference for fast response,

maximize datacenter throughput

TESLA V100 TENSOR CORE GPUWorld’s Most Advanced Data Center GPU

5,120 CUDA cores

640 NEW Tensor cores

7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS

| 125 Tensor TFLOPS

20MB SM RF | 16MB Cache

32 GB HBM2 @ 900GB/s |

300GB/s NVLink

TESLA PLATFORM ENABLES DRAMATIC REDUCTION IN TIME TO TRAIN

0 20 40 60 80 100 120 140

2x CPU

Single Node1X P100

Single Node1X V100

8x V100

At scale256x V100

Relative Time to Train Improvements(ResNet-50)

ResNet-50, 90 epochs to solution | CPU Server: dual socket Intel Xeon Gold 6140

14 Minutes

4 Hours

25 Days

30 Hours

4.8 Days

Universal Inference Acceleration

320 Turing Tensor cores

2,560 CUDA cores

65 FP16 TFLOPS | 130 INT8 TOPS | 260 INT4 TOPS

16GB | 320GB/s

ANNOUNCING TESLA T4WORLD’S MOST ADVANCED INFERENCE GPU

Up To 36X Faster Than CPUs | Accelerates All AI Workloads

WORLD’S MOST PERFORMANT INFERENCE PLATFORM

Speedup: 36x fasterGNMT

Speedup: 27x fasterResNet-50 (7ms latency limit)

Speedup: 21X fasterDeepSpeech 2

Natural Language Processing Inference

CPU Server Tesla P4 Tesla T4

Speech Inference

Video Inference

Peak Performance

Float INT8 Float INT8 INT4

ADVANCE YOUR DEEP LEARNING TRAINING AT GTCDon’t miss the world’s most important event for AI & GPU in Federal

Washington DC

October 23-24, 2018

AI & GPU ACCELERATED COMPUTING IN GOVERNMENT · AI & GPU ACCELERATED COMPUTING IN...

Documents

Introduction to GPU Computing - NERSC · ACCELERATED COMPUTING IS GROWING RAPIDLY 22x GPU Developers 45,000 1M 2012 2018 580 Applications Accelerated 580 0 100 200 300 400 500 600

MANAGING GPU ACCELERATED COMPUTING - NVIDIA · 2019-09-12 · accelerated applications - GPU support in Kubernetes using the NVIDIA device plugin • Specify GPU attributes such as

GPU-Accelerated 2D and Web Renderingon-demand.gputechconf.com/.../SS106-GPU-Accelerated... · independent 2D graphics, ... GPU-accelerated OpenVG reference Cairo Qt Stroking with

GPU Accelerated Bioinformatics Research at …on-demand.gputechconf.com/gtc/2012/presentations/S0519...GPU Accelerated Bioinformatics Research at BGI GPU Technology Conference 2012

GPU-Accelerated Applications for HPC Industries| … · Accelerated computing has revolutionized a broad range of industries with over four hundred applications optimized for GPUs

GPU - Super Micro Computer, Inc. · GPU-Accelerated for Finance, Oil & Gas, Science, and Engineering S upermicro extends its technology lead in the GPU computing space with this innovative

FPGA Accelerated Computing Using AWS F1 Instances · 2017-09-07 · FPGA Accelerated Computing Using AWS F1 Instances. Applications and development environment. F1. ... GPU and FPGA

GPU Accelerated Genomics Data Compression · 2014. 4. 10. · Bzip2Qlike’Compression’Method’Pipeline ... GPU Accelerated Genomics Data Compression GuiXin Guo

GPU-Accelerated Path Rendering - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2012/presentations/S... · GPU-Accelerated Path Rendering . Mark Kilgard •Principal System

TESLA ACCELERATED COMPUTING - Welcome to NCI · Tesla Accelerated Computing Platform. 13 TESLA K80 ... Maxwell DX12 Pascal Unified Memory 3D Memory ... 3D FFT, ANSYS: 2 GPU configuration

ACCELERATED COMPUTING & DEEP LEARNING€¦ · Tesla Accelerated Computing Platform GPU Accelerators Interconnect System ... Programming Languages Infrastructure Management System

GPU-ACCELERATED APPLICATIONSstatic.highspeedbackbone.net/pdf/gpu-applications-catalog.pdf · Accelerated computing has revolutionized a broad range of industries with over six hundred

GPU Accelerated AES

GPU COMPUTING - Normandie - Sciencesconf.org...2 3 4 1 1. Review available GPU-accelerated applications 2. Check for GPU-Accelerated applications and libraries 3. Add OpenACC Directives

GPU-Accelerated Multiphysics Simulation

Toward efficient GPU-accelerated

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Accelerated Computing

The GPU Accelerated Database

GPU-Accelerated Cloud Computing for Data-Intensive Applications › ~hebs › pub › baoxuegpu_springer... · 2016-06-08 · GPU-Accelerated Cloud Computing for Data-Intensive Applications

GPU-Accelerated Optical Coherence Tomography (OCT) Imagingdeveloper.download.nvidia.com/.../S0141-GTC2012-GPU-Optical-Tom… · GPU-Accelerated Optical Coherence Tomography (OCT)