24
SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing

Nvidia SC13 Podcast

Embed Size (px)

DESCRIPTION

In this slidecast, Sumit Gupta from Nvidia discusses the latest product news on GPU computing for HPC. * IBM and NVIDIA Partner to Build Next-Generation Supercomputers * NVIDIA Launches the Tesla K40 GPU Accelerator, their fastest accelerator ever Learn more: http://nvidianews.nvidia.com/Releases/NVIDIA-Launches-World-s-Fastest-Accelerator-for-Supercomputing-and-Big-Data-Analytics-a66.aspx Watch the video presentation: http://wp.me/p3RLHQ-aRY

Citation preview

Page 1: Nvidia SC13 Podcast

SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing

Page 2: Nvidia SC13 Podcast

SC13 News

IBM Taps GPU Accelerators 1

New Product Announcements 2

New Supercomputer Announcements 3

Page 3: Nvidia SC13 Podcast

113

182

242

0

50

100

150

200

250

300

2011 2012 20130%

10%

20%

30%

40%

50%

2010 2011 2012

Accelerated Computing Growing Fast

2x Growth in One Year Hundreds of GPU Accelerated Apps

NVIDIA GPU is Accelerator of Choice

NVIDIA GPUs 85%

INTEL PHI 4%

OTHERS 11%

Intersect360 Research HPC User Site Census: Systems, July 2013

Percent of HPC Systems With Accelerators

22% 24%

44%

Intersect360 Research HPC User Site Census: Systems, July 2013

Page 4: Nvidia SC13 Podcast

IBM Using GPUs to Accelerate Enterprise & Data Analytics Applications

Predictive Analytics

Risk Analytics

Business Intelligence Application Infrastructure

Page 5: Nvidia SC13 Podcast

IBM Partners with NVIDIA to Build Next-Generation Supercomputers

POWER8 CPU

Tesla GPU

+

GPU-Accelerated POWER-Based Systems Available in 2014

Page 6: Nvidia SC13 Podcast

GPU Computing in Data Centers

2007 2013

x86

Power

ARM64

x86

2008 2009 2010 2011 2012 2014

Page 7: Nvidia SC13 Podcast

OpenACC-standard.org confidential 7

Linux GCC Compiler to Support GPU Accelerators

Open Source OpenACC in GCC by Mentor Graphics & Samsung

Pervasive Impact Free to all Linux users

Mainstream Most Widely Used HPC Compiler

Oscar Hernandez Oak Ridge National Laboratory

Incorporating OpenACC into GCC is an excellent example of open source and open standards working together to make accelerated computing broadly accessible to all Linux developers.

Page 8: Nvidia SC13 Podcast

SC13 News

IBM Taps GPU Accelerators 1

New Product Announcements 2

New Supercomputer Announcements 3

Page 9: Nvidia SC13 Podcast

Tesla K40 World’s Fastest Accelerator

for Supercomputing and Big Data Analytics

CUDA 6 Dramatically Simplifies

Parallel Programming with Unified Memory

Page 10: Nvidia SC13 Podcast

0

1

2

3

4

5

CPU K20X K40

ns/day

Tesla K40 World’s Fastest Accelerator

FASTER 1.4 TF| 2880 Cores | 288 GB/s

LARGER 2x Memory Enables More Apps

SMARTER Unlock Extra Performance

Using Power Headroom

AMBER Benchmark: SPFP-Nucleosome CPU: Dual E5-2687W @ 3.10GHz, 64GB System Memory, CentOS 6.2, GPU systems: Single Tesla K20X or Single Tesla K40

AMBER Benchmark

6GB

Fluid Dynamics

Seismic Analysis

Rendering

12GB

GPU Boost

Page 11: Nvidia SC13 Podcast

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

AMBER SPFP-TRPCage LAMMPS-EAM NAMD 2.9-APOA1

Tesla K40 (base) Tesla K40 with GPU Boost

GPU Boost Up to 25% Extra Performance on Applications

Use Power Headroom to Run at Higher Clocks 25% Faster 20%

Faster 14% Faster

17% Faster

13% Faster

11% Faster

Page 12: Nvidia SC13 Podcast

ANNOUNCING Unified Memory

CUDA 6

Page 13: Nvidia SC13 Podcast

Unified Memory Dramatically Lower Developer Effort

Developer View Today Developer View With Unified Memory

Unified Memory System Memory

GPU Memory

Page 14: Nvidia SC13 Podcast

Super Simplified Memory Management Code

void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); fread(data, 1, N, fp); qsort(data, N, 1, compare); use_data(data); free(data); }

void sortfile(FILE *fp, int N) { char *data; cudaMallocManaged(&data, N); fread(data, 1, N, fp); qsort<<<...>>>(data,N,1,compare); cudaDeviceSynchronize(); use_data(data); cudaFree(data); }

CPU Code CUDA 6 Code with Unified Memory

Page 15: Nvidia SC13 Podcast

SC13 News

IBM Taps GPU Accelerators 1

New Product Announcements 2

New Supercomputer Announcements 3

Page 16: Nvidia SC13 Podcast

Fastest Supercomputer In Europe 6.27 PetaFLOPS (80% Linpack Efficiency)

Production-Grade Weather Forecasts: COSMO

Piz Daint

7 National Weather Agencies Germany | Greece | Italy | Poland | Russia |

Romania | Switzerland

Greenest Petascale System 3110 MFLOPS/W

#2: JUQUEEN: 2176 MFLOPS/W

Page 17: Nvidia SC13 Podcast

Greenest Supercomputer in the World

4000+ MFLOPS per Watt

25% Higher than #1 Green500 System

160 Tesla K20X GPUs

Oil Immersion Technology

Tokyo Tech KFC System

Current Green500 #1: CINECA Eurora System, Italy, 3208 MF/W

Page 18: Nvidia SC13 Podcast

0

5

10

15

20

25

30

CPU K40

Num

ber

of J

obs

per

Day

ANSYS Fluent Doubles Performance with GPUs

90% Faster

2x Better Insight for Low Drag Design

2% Less Drag

1.5B Gal. of Fuel Saved/Year

Automobile Drag Simulation Throughput

2 x E5-2680 CPUs 8 cores used; 2 Tesla K40s Sedan Geometry, 3.6M mixed cells

Steady, turbulent, external aerodynamics- Coupled PBNS, DP Solver

Page 19: Nvidia SC13 Podcast

SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing

Page 20: Nvidia SC13 Podcast

Additional Information

Page 21: Nvidia SC13 Podcast

1.4x

1.3x 1.2x 1.3x 1.3x 1.3x 1.3x

0.0

0.5

1.0

1.5

ANSYS 14 LAMMPS NAMD 2.9 AMBER LSMS QMCPACK CUBLAS

K20X K40 @ base K40 @ boost

Tesla K40 20-40% Faster than K20X on Applications

SMP-V14sp-4 EAM APOA1 SPFP-Nucleosome Fe32 3x3x1

Page 22: Nvidia SC13 Podcast

First Tesla K40 Customers

Swinburne Australia CSC Finland Texas Advanced

Computing Center CEA France

Page 23: Nvidia SC13 Podcast

Tesla K40 OEM Partners

Page 24: Nvidia SC13 Podcast

K20X K40

Peak Single Precision Peak SGEMM

3.93 TF 2.95 TF

4.29 TF 3.22 TF

Peak Double Precision Peak DGEMM

1.31 TF 1.22 TF

1.43 TF 1.33 TF

Memory size 6 GB 12 GB

Memory BW (ECC off) 250 GB/s 288 GB/s

Memory Clock 2.6 GHz 3.0 GHz

PCIe Gen Gen 2 Gen 3

# of Cores 2688 2880

Core Clock 732 MHz Base: 745 MHz Boost Clocks: 810 & 875 Mhz

Total Board Power 235W 235W

Form Factor PCIe Passive PCIe Passive, Active

9