Upload
insidehpc
View
492
Download
0
Tags:
Embed Size (px)
DESCRIPTION
In this slidecast, Sumit Gupta from Nvidia discusses the latest product news on GPU computing for HPC. * IBM and NVIDIA Partner to Build Next-Generation Supercomputers * NVIDIA Launches the Tesla K40 GPU Accelerator, their fastest accelerator ever Learn more: http://nvidianews.nvidia.com/Releases/NVIDIA-Launches-World-s-Fastest-Accelerator-for-Supercomputing-and-Big-Data-Analytics-a66.aspx Watch the video presentation: http://wp.me/p3RLHQ-aRY
Citation preview
SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing
SC13 News
IBM Taps GPU Accelerators 1
New Product Announcements 2
New Supercomputer Announcements 3
113
182
242
0
50
100
150
200
250
300
2011 2012 20130%
10%
20%
30%
40%
50%
2010 2011 2012
Accelerated Computing Growing Fast
2x Growth in One Year Hundreds of GPU Accelerated Apps
NVIDIA GPU is Accelerator of Choice
NVIDIA GPUs 85%
INTEL PHI 4%
OTHERS 11%
Intersect360 Research HPC User Site Census: Systems, July 2013
Percent of HPC Systems With Accelerators
22% 24%
44%
Intersect360 Research HPC User Site Census: Systems, July 2013
IBM Using GPUs to Accelerate Enterprise & Data Analytics Applications
Predictive Analytics
Risk Analytics
Business Intelligence Application Infrastructure
IBM Partners with NVIDIA to Build Next-Generation Supercomputers
POWER8 CPU
Tesla GPU
+
GPU-Accelerated POWER-Based Systems Available in 2014
GPU Computing in Data Centers
2007 2013
x86
Power
ARM64
x86
2008 2009 2010 2011 2012 2014
OpenACC-standard.org confidential 7
Linux GCC Compiler to Support GPU Accelerators
Open Source OpenACC in GCC by Mentor Graphics & Samsung
Pervasive Impact Free to all Linux users
Mainstream Most Widely Used HPC Compiler
Oscar Hernandez Oak Ridge National Laboratory
Incorporating OpenACC into GCC is an excellent example of open source and open standards working together to make accelerated computing broadly accessible to all Linux developers.
“
”
SC13 News
IBM Taps GPU Accelerators 1
New Product Announcements 2
New Supercomputer Announcements 3
Tesla K40 World’s Fastest Accelerator
for Supercomputing and Big Data Analytics
CUDA 6 Dramatically Simplifies
Parallel Programming with Unified Memory
0
1
2
3
4
5
CPU K20X K40
ns/day
Tesla K40 World’s Fastest Accelerator
FASTER 1.4 TF| 2880 Cores | 288 GB/s
LARGER 2x Memory Enables More Apps
SMARTER Unlock Extra Performance
Using Power Headroom
AMBER Benchmark: SPFP-Nucleosome CPU: Dual E5-2687W @ 3.10GHz, 64GB System Memory, CentOS 6.2, GPU systems: Single Tesla K20X or Single Tesla K40
AMBER Benchmark
6GB
Fluid Dynamics
Seismic Analysis
Rendering
12GB
GPU Boost
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
AMBER SPFP-TRPCage LAMMPS-EAM NAMD 2.9-APOA1
Tesla K40 (base) Tesla K40 with GPU Boost
GPU Boost Up to 25% Extra Performance on Applications
Use Power Headroom to Run at Higher Clocks 25% Faster 20%
Faster 14% Faster
17% Faster
13% Faster
11% Faster
ANNOUNCING Unified Memory
CUDA 6
Unified Memory Dramatically Lower Developer Effort
Developer View Today Developer View With Unified Memory
Unified Memory System Memory
GPU Memory
Super Simplified Memory Management Code
void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); fread(data, 1, N, fp); qsort(data, N, 1, compare); use_data(data); free(data); }
void sortfile(FILE *fp, int N) { char *data; cudaMallocManaged(&data, N); fread(data, 1, N, fp); qsort<<<...>>>(data,N,1,compare); cudaDeviceSynchronize(); use_data(data); cudaFree(data); }
CPU Code CUDA 6 Code with Unified Memory
SC13 News
IBM Taps GPU Accelerators 1
New Product Announcements 2
New Supercomputer Announcements 3
Fastest Supercomputer In Europe 6.27 PetaFLOPS (80% Linpack Efficiency)
Production-Grade Weather Forecasts: COSMO
Piz Daint
7 National Weather Agencies Germany | Greece | Italy | Poland | Russia |
Romania | Switzerland
Greenest Petascale System 3110 MFLOPS/W
#2: JUQUEEN: 2176 MFLOPS/W
Greenest Supercomputer in the World
4000+ MFLOPS per Watt
25% Higher than #1 Green500 System
160 Tesla K20X GPUs
Oil Immersion Technology
Tokyo Tech KFC System
Current Green500 #1: CINECA Eurora System, Italy, 3208 MF/W
0
5
10
15
20
25
30
CPU K40
Num
ber
of J
obs
per
Day
ANSYS Fluent Doubles Performance with GPUs
90% Faster
2x Better Insight for Low Drag Design
2% Less Drag
1.5B Gal. of Fuel Saved/Year
Automobile Drag Simulation Throughput
2 x E5-2680 CPUs 8 cores used; 2 Tesla K40s Sedan Geometry, 3.6M mixed cells
Steady, turbulent, external aerodynamics- Coupled PBNS, DP Solver
SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing
Additional Information
1.4x
1.3x 1.2x 1.3x 1.3x 1.3x 1.3x
0.0
0.5
1.0
1.5
ANSYS 14 LAMMPS NAMD 2.9 AMBER LSMS QMCPACK CUBLAS
K20X K40 @ base K40 @ boost
Tesla K40 20-40% Faster than K20X on Applications
SMP-V14sp-4 EAM APOA1 SPFP-Nucleosome Fe32 3x3x1
First Tesla K40 Customers
Swinburne Australia CSC Finland Texas Advanced
Computing Center CEA France
Tesla K40 OEM Partners
K20X K40
Peak Single Precision Peak SGEMM
3.93 TF 2.95 TF
4.29 TF 3.22 TF
Peak Double Precision Peak DGEMM
1.31 TF 1.22 TF
1.43 TF 1.33 TF
Memory size 6 GB 12 GB
Memory BW (ECC off) 250 GB/s 288 GB/s
Memory Clock 2.6 GHz 3.0 GHz
PCIe Gen Gen 2 Gen 3
# of Cores 2688 2880
Core Clock 732 MHz Base: 745 MHz Boost Clocks: 810 & 875 Mhz
Total Board Power 235W 235W
Form Factor PCIe Passive PCIe Passive, Active
9