Announcing Tesla K20 Family NVIDIA Tesla Update Sumit ... · General Manager Tesla Accelerated...

NVIDIA Tesla Update

Supercomputing’12 Sumit Gupta

General Manager

Tesla Accelerated Computing

Announcing Tesla K20 Family

Sumit Gupta

General Manager

Tesla Accelerated Computing

Today’s information is embargoed until

November 12 – 6:00 am US Pacific Time

Accelerated Computing Meets Increased Demand for Science

http://www.teragridforum.org/mediawiki/images/f/f8/TGQR_2011Q1_Report.pdf

2008 2009 2010 2011 2012

Launches

Top500 Systems OEM Systems

Industry Apps Universities

Normalized to 2008

March of the GPUs

2008 2010 2012 2014

Tesla Fermi

Kepler

Maxwell

Tesla K20 Family

World’s Fastest, Most Effi ient A elerator 1

2 Powered y CUDA: World’s Most Pervasive Parallel Programming Model

3 Delivers World Record Performance for Scientific Apps

Announcing Tesla K20 Accelerator Family

Tesla K20X

Tesla K20X Tesla K20

Peak Double Precision 1.31 TF 1.17 TF

Peak Single Precision 3.95 TF 3.52 TF

Memory Bandwidth 250 GB/s 208 GB/s

Memory size 6 GB 5 GB

K20X: 3x Faster Than Fermi

0.17 0.43

Xeon E5-2687Wc

(8 core, 3.1 Ghz)

Tesla M2090 (Fermi) Tesla K20X

TFlops

K20X: Most Efficient Accelerator

Fermi Server

2x SB CPUs + 2x M2090s

Kepler Server

2x SB CPUs + 2x K20X

Linpack

TFlops

61% Efficiency

76% Efficiency

Server Configuration: Dual socket E5-2680, 2.7 GHz + 2 GPUs

Titan: World’s #1 Open Science Supercomputer 18,688 Tesla K20X GPUs

27 Petaflops Peak: 90% of Performance from GPUs

17.59 Petaflops Sustained Performance on Linpack

Current Green500 List

K20X: Most Energy Efficient Accelerator

Titan K20X System Beats

#1 on Green500: BlueGene/Q

2142.77 MFLOPS/W

30 Petaflops in 30 Days

K20 / K20X Availability

Shipping this week

General Availability: November-December

Tesla K20 Family

CUDA: World’s Most Pervasive Parallel Programming Model

629 University Courses

In 62 Countries 8,000 Institutions with

CUDA Developers

1,500,000 CUDA Downloads

395,000,000 CUDA GPUs Shipped

Top Supercomputing Apps

Computational

Chemistry

CHARMM

GROMACS

LAMMPS

DL_POLY

Material

Science

QMCPACK

Quantum Espresso

GAMESS

Gaussian

NWChem

Climate &

Weather

GEOS-5

CAM-SE

Physics Chroma

Denovo

CAE ANSYS Mechanical

MSC Nastran

SIMULIA Abaqus

ANSYS Fluent

OpenFOAM

LS-DYNA

CUDA Apps Grows 60%, Accelerating Key Apps

2010 2011 2012

# of Apps

40% Increase

61% Increase

Accelerated, In Development

Leading Apps Now Accelerated by GPUs

Fluid Dynamics Structual Mechanics Life Sciences

CHARMM

Tesla K20 Family

0.0x 5.0x 10.0x 15.0x 20.0x

SPECFEM3D

Chroma

MATLAB (FFT)*Higher Ed

Science

Physics

Molecular

Dynamics

Fastest Performance on Scientific Applications Tesla K20X Speed-Up over Sandy Bridge CPUs

System Config- CPU results: Dual socket E5-2687w, 3.10 GHz

GPU results: Dual socket E5-2687w + 2 Tesla K20X GPUs

*MATLAB results comparing one i7-2600K CPU vs with Tesla K20 GPU

Record Breaking Simulation

WL-LSMS: Material Science

Discover better materials for

magnetic storage

New Record 10+ PFLOPS

Old Record 3.1 PFLOPS

Effort 2% Lines of Code

2011 Gordon Bell Winner at 3.08 Petaflops on K Computer

Applications Scale to 1000s of GPUs

128 256 512 768

# of Compute Nodes

Molecular Dynamics NAMD, 100x STMV

Cray XK7 - K20X Cray XK7 - CPU

ns/day

250000

500000

750000

1000000

1250000

1500000

0 500 1000 1500 2000 2500

# of Compute Nodes

Material Science QMCPACK, 3x3x1 Graphite

Cray XK7-Tesla K20X Cray XK7-CPU

Compute

Efficiency

The Era of Accelerated Computing is Here

1980 1990 2000 2010 2020

Era of

Vector Computing

Era of

Accelerated Computing

Era of

Distributed Computing

SC12 News Summary

Introducing the Tesla K20 Accelerator Family 1

2 New CUDA Accelerated Apps and Growing Ecosystem

3 Record Setting Performance on Scientific Applications

Embargoed Until Nov 12 – 6:00 am US PT

“Tesla K20 GPU is 2.3x faster than Tesla M2070, and

no change was required in our code! ” Associate Professor in Mechanical Engineering

Inanc Senocak

“Results are amazing! It is 160x faster than our CPU

code and 2.5x faster than Fermi for our solutions ” Professor in Computer Science

Estaban Clua

Research Scientist

Oreste Villa, Antonino Tumeo

“Tesla K20 is very impressive. Our application

runs 20x faster compared to a Sandy Bridge CPU. ”

Customers Seeing Impressive K20 Speedups

Teaching Parallel Programming with CUDA

Professor Chris Lupo

Cal Poly San Luis Obispo

“I have found GPU programming using CUDA to be one of the easiest ways

to introduce students to parallel programming. ” Professor Eric Darve

Stanford University

“My students are amazed to find how easy the parallel programming with

CUDA is and are thrilled by the performance from NVIDIA GPUs. ” Professor Miaoqing Huang

University of Arkansas

“CUDA allows me to teach students with no prior parallel programming

experience to parallelize real-world apps in just a few weeks.

OpenACC Makes GPU Accelerator Easier

S3D: Fuel Combustion

Design alternative fuels with

up to 50% higher efficiency

10 days

Jaguar

42 days

Minimal Effort

with OpenACC

Modified <1%

Lines of Code

4x Faster

Hyper-Q

Easy speed-up for legacy MPI codes

Kepler: GPU Acceleration Made Easier Than Ever

Dynamic Parallelism

GPU generates work for itself

Hyper-Q: 32 MPI jobs per GPU

Easy Speed-up for Legacy MPI Apps

Kepler: GPU Acceleration Made Easier Than Ever

Dynamic Parallelism: GPU Generates Own Work

Less Effort, Higher Performance

0 5 10 15 20

Number of GPUs

CP2K- Quantum Chemistry

K20 with Hyper-Q K20 without Hyper-Q

0 5 10

Increasing Problem Size (# of Elements) Millions

Quicksort

Without Dynamic Parallelism With Dynamic Parallelism

All Accelerators Programmed the Same Way

Method Xeon Phi GPU

Libraries Limited Support

Few functions in Intel MKL for

offload mode

Broad Support

BLAS, FFT, MAGMA, CULA, …

Directives Proprietary

Xeon Phi specific directives

OpenACC

Based on portable, industry

standard

Language

Extensions

Proprietary

Vector intrinsics, like assembly

programming

Simple C/C++/Fortran

extensions

Announcing Tesla K20 Family NVIDIA Tesla Update Sumit ... · General Manager Tesla Accelerated...

Documents

NIKOLA TESLA - Avalon Libraryavalonlibrary.net/Bill/Nikola_Tesla/Books/Nikola Tesla - Lecture... · * The Tesla family moved to nearby when Nikola reached the age of six to enter

Pepper Fuchs Vbg Dn k20 Dmd e Vbg Dn k20 d

Catalogue K20

K20 ccetc dec11

K20 Family Product Brief - NXP Semiconductors · K20 Family 32KB-1MB 32-144pin K10 Family 32KB-1MB 32-144pin K40 Family 64-512KB 64-144pin Family Program Flash ... 100 MHz, 120 …

UC Santa Cruz Tesla @ SLAC Family Day SCIPP UC Santa Cruz UC Santa Cruz Tesla Coil @ SLAC Family Day University of California (SC & SD) n Prof. Hartmut

programming graphics processing units in Pythonhomepages.math.uic.edu/~jan/mcs507/gpuacceleration.pdf · NVIDIA Tesla K20 “Kepler” C-class Accelerator 2,496 CUDA cores, 2,496

Board Specification - Nvidia · Board Specification . TESLA K20 GPU ACCELERATOR . Tesla K20 GPU Accelerator BD-06455-001_v05 | ii . DOCUMENT CHANGE HISTORY . BD-06455-001_v05 . Version

Nikola Tesla patented: - NuEnergynuenergy.org/uploads/tesla/US613809.pdf · tesla author: nikola tesla, nikola tesla

Graphics Processing Units (GPU) for HEP trigger systems · TESLA K20 TTC interface NANET – 13 ° 15 Latency performances 11 – 13 ° 15 Performances 12 After NANET latency if fully

VS-K20 Keyboard Controller

Inside Tesla Kepler K20 - Nvidiadeveloper.download.nvidia.com/GTC/inside-tesla-kepler-k20-family.pdf · Tesla K20 GPU is 2.3x faster than Tesla M2070, and no change was required in

Board Specification - Computing · The Tesla K20 GPU accelerator is a performance optimized, high-end product and uses power from the PCI Express connector as well as external power

economic systems infograph k20

K20P64M72SF1 K20 Sub-Familydlnmh9ip6v2uc.cloudfront.net/datasheets/Dev/... · K20P64M72SF1 K20 Sub-Family Supports: MK20DX64VLH7, MK20DX128VLH7, MK20DX256VLH7 Features • Operating

Tesla: Fastest Processor Adoption in HPC History · Tesla GPU Computing Products Tesla S1070 System Tesla C1060 Processor GPUs 4 Tesla GPUs 1 Tesla GPU 1 Tesla GPU Single Precision

caodangvinhphuc.edu.vncaodangvinhphuc.edu.vn/uploads/xem-diem-thi-hp/2017_01/lan-1-ki-1-k20mn.pdf · K20 K20 2 3 K20 K20 K20 K20 6 1

SSS K20 Glaukoma

Get a Life - K20 LEARN

K20 Libraries