Nvidia SC13 Podcast

SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing

SC13 News

IBM Taps GPU Accelerators 1

New Product Announcements 2

New Supercomputer Announcements 3

113

182

242

0

50

100

150

200

250

300

2011 2012 20130%

10%

20%

30%

40%

50%

2010 2011 2012

Accelerated Computing Growing Fast

2x Growth in One Year Hundreds of GPU Accelerated Apps

NVIDIA GPU is Accelerator of Choice

NVIDIA GPUs 85%

INTEL PHI 4%

OTHERS 11%

Intersect360 Research HPC User Site Census: Systems, July 2013

Percent of HPC Systems With Accelerators

22% 24%

44%

Intersect360 Research HPC User Site Census: Systems, July 2013

IBM Using GPUs to Accelerate Enterprise & Data Analytics Applications

Predictive Analytics

Risk Analytics

Business Intelligence Application Infrastructure

IBM Partners with NVIDIA to Build Next-Generation Supercomputers

POWER8 CPU

Tesla GPU

+

GPU-Accelerated POWER-Based Systems Available in 2014

GPU Computing in Data Centers

2007 2013

x86

Power

ARM64

x86

2008 2009 2010 2011 2012 2014

OpenACC-standard.org confidential 7

Linux GCC Compiler to Support GPU Accelerators

Open Source OpenACC in GCC by Mentor Graphics & Samsung

Pervasive Impact Free to all Linux users

Mainstream Most Widely Used HPC Compiler

Oscar Hernandez Oak Ridge National Laboratory

Incorporating OpenACC into GCC is an excellent example of open source and open standards working together to make accelerated computing broadly accessible to all Linux developers.

“

”

SC13 News




Tesla K40 World’s Fastest Accelerator

for Supercomputing and Big Data Analytics

CUDA 6 Dramatically Simplifies

Parallel Programming with Unified Memory

0

1

2

3

4

5

CPU K20X K40

ns/day

Tesla K40 World’s Fastest Accelerator

FASTER 1.4 TF| 2880 Cores | 288 GB/s

LARGER 2x Memory Enables More Apps

SMARTER Unlock Extra Performance

Using Power Headroom

AMBER Benchmark: SPFP-Nucleosome CPU: Dual E5-2687W @ 3.10GHz, 64GB System Memory, CentOS 6.2, GPU systems: Single Tesla K20X or Single Tesla K40

AMBER Benchmark

6GB

Fluid Dynamics

Seismic Analysis

Rendering

12GB

GPU Boost

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

AMBER SPFP-TRPCage LAMMPS-EAM NAMD 2.9-APOA1

Tesla K40 (base) Tesla K40 with GPU Boost

GPU Boost Up to 25% Extra Performance on Applications

Use Power Headroom to Run at Higher Clocks 25% Faster 20%

Faster 14% Faster

17% Faster

13% Faster

11% Faster

ANNOUNCING Unified Memory

CUDA 6

Unified Memory Dramatically Lower Developer Effort

Developer View Today Developer View With Unified Memory

Unified Memory System Memory

GPU Memory

Super Simplified Memory Management Code

void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); fread(data, 1, N, fp); qsort(data, N, 1, compare); use_data(data); free(data); }

void sortfile(FILE *fp, int N) { char *data; cudaMallocManaged(&data, N); fread(data, 1, N, fp); qsort<<<...>>>(data,N,1,compare); cudaDeviceSynchronize(); use_data(data); cudaFree(data); }

CPU Code CUDA 6 Code with Unified Memory

SC13 News




Fastest Supercomputer In Europe 6.27 PetaFLOPS (80% Linpack Efficiency)

Production-Grade Weather Forecasts: COSMO

Piz Daint

7 National Weather Agencies Germany | Greece | Italy | Poland | Russia |

Romania | Switzerland

Greenest Petascale System 3110 MFLOPS/W

#2: JUQUEEN: 2176 MFLOPS/W

Greenest Supercomputer in the World

4000+ MFLOPS per Watt

25% Higher than #1 Green500 System

160 Tesla K20X GPUs

Oil Immersion Technology

Tokyo Tech KFC System

Current Green500 #1: CINECA Eurora System, Italy, 3208 MF/W

0

5

10

15

20

25

30

CPU K40

Num

ber

of J

obs

per

Day

ANSYS Fluent Doubles Performance with GPUs

90% Faster

2x Better Insight for Low Drag Design

2% Less Drag

1.5B Gal. of Fuel Saved/Year

Automobile Drag Simulation Throughput

2 x E5-2680 CPUs 8 cores used; 2 Tesla K40s Sedan Geometry, 3.6M mixed cells

Steady, turbulent, external aerodynamics- Coupled PBNS, DP Solver

04937_Ansys_SxS_R4.mov

SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing

Additional Information

1.4x

1.3x 1.2x 1.3x 1.3x 1.3x 1.3x

0.0

0.5

1.0

1.5

ANSYS 14 LAMMPS NAMD 2.9 AMBER LSMS QMCPACK CUBLAS

K20X K40 @ base K40 @ boost

Tesla K40 20-40% Faster than K20X on Applications

SMP-V14sp-4 EAM APOA1 SPFP-Nucleosome Fe32 3x3x1

First Tesla K40 Customers

Swinburne Australia CSC Finland Texas Advanced

Computing Center CEA France

Tesla K40 OEM Partners

K20X K40

Peak Single Precision Peak SGEMM

3.93 TF 2.95 TF

4.29 TF 3.22 TF

Peak Double Precision Peak DGEMM

1.31 TF 1.22 TF

1.43 TF 1.33 TF

Memory size 6 GB 12 GB

Memory BW (ECC off) 250 GB/s 288 GB/s

Memory Clock 2.6 GHz 3.0 GHz

PCIe Gen Gen 2 Gen 3

# of Cores 2688 2880

Core Clock 732 MHz Base: 745 MHz Boost Clocks: 810 & 875 Mhz

Total Board Power 235W 235W

Form Factor PCIe Passive PCIe Passive, Active

9