84
Feb. 2, 2017 Quantum Chemistry (QC) on GPUs

Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Feb. 2, 2017

Quantum Chemistry (QC) on GPUs

Page 2: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

2

Overview of Life & Material Accelerated Apps

MD: All key codes are GPU-accelerated

Great multi-GPU performance

Focus on dense (up to 16) GPU nodes &/or large # of

GPU nodes

ACEMD*, AMBER (PMEMD)*, BAND, CHARMM, DESMOND, ESPResso,

Folding@Home, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*,

LAMMPS, Lattice Microbes*, mdcore, MELD, miniMD, NAMD,

OpenMM, PolyFTS, SOP-GPU* & more

QC: All key codes are ported or optimizing

Focus on using GPU-accelerated math libraries,

OpenACC directives

GPU-accelerated and available today:

ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS-

UK, GPAW, LATTE, LSDalton, LSMS, MOLCAS, MOPAC2012,

NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack,

Quantum Espresso/PWscf, QUICK, TeraChem*

Active GPU acceleration projects:

CASTEP, GAMESS, Gaussian, ONETEP, Quantum

Supercharger Library*, VASP & more

green* = application where >90% of the workload is on GPU

Page 3: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

3

MD vs. QC on GPUs

“Classical” Molecular Dynamics Quantum Chemistry (MO, PW, DFT, Semi-Emp)Simulates positions of atoms over time;

chemical-biological or chemical-material behaviors

Calculates electronic properties; ground state, excited states, spectral properties,

making/breaking bonds, physical properties

Forces calculated from simple empirical formulas (bond rearrangement generally forbidden)

Forces derived from electron wave function (bond rearrangement OK, e.g., bond energies)

Up to millions of atoms Up to a few thousand atoms

Solvent included without difficulty Generally in a vacuum but if needed, solvent treated classically (QM/MM)

or using implicit methods

Single precision dominated Double precision is important

Uses cuBLAS, cuFFT, CUDA Uses cuBLAS, cuFFT, OpenACC

Geforce (Accademics), Tesla (Servers) Tesla recommended

ECC off ECC on

Page 4: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

4

Accelerating Discoveries

Using a supercomputer powered by the Tesla

Platform with over 3,000 Tesla accelerators,

University of Illinois scientists performed the first

all-atom simulation of the HIV virus and discovered

the chemical structure of its capsid — “the perfect

target for fighting the infection.”

Without gpu, the supercomputer would need to be

5x larger for similar performance.

Page 5: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

5

GPU-Accelerated Quantum Chemistry Apps

Abinit

ACES III

ADF

BigDFT

CP2K

GAMESS-US

Gaussian

GPAW

LATTE

LSDalton

MOLCAS

Mopac2012

NWChem

Green Lettering Indicates Performance Slides Included

GPU Perf compared against dual multi-core x86 CPU socket.

Quantum SuperChargerLibrary

RMG

TeraChem

UNM

VASP

WL-LSMS

Octopus

ONETEP

Petot

Q-Chem

QMCPACK

Quantum Espresso

Page 6: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

ABINIT

Page 7: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

ABINIT on GPUS

Speed in the parallel version:

For ground-state calculations, GPUs can be used. This is based on

CUDA+MAGMA

For ground-state calculations, the wavelet part of ABINIT (which is BigDFT) is

also very well parallelized : MPI band parallelism, combined with GPUs

Page 8: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

BigDFT

Page 9: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Courtesy of BigDFTteam @ CEA

Page 10: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Courtesy of BigDFTteam @ CEA

Page 11: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Courtesy of BigDFTteam @ CEA

Page 12: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Courtesy of BigDFTteam @ CEA

Page 13: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Courtesy of BigDFTteam @ CEA

Page 14: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Courtesy of BigDFTteam @ CEA

Page 15: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Gaussian

Page 16: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Gaussian

ACS Fall 2011 press releaseJoint collaboration between Gaussian, NVDA and PGI for GPU acceleration: http://www.gaussian.com/g_press/nvidia_press.htmNo such press release exists for Intel MIC or AMD GPUsMike Frisch quote from press release:

“Calculations using Gaussian are limited primarily by the available computing resources,” said Dr. Michael Frisch, president of Gaussian, Inc. “By coordinating the development of hardware, compiler technology and application software among the three companies, the new application will bring the speed and cost-effectiveness of GPUs to the challenging problems and applications that Gaussian’s customers need to address.”

Page 17: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

April 4-7, 2016 | Silicon Valley

Roberto Gomperts (NVIDIA), Michael Frisch (Gaussian, Inc.), Giovanni Scalmani(Gaussian, Inc.), Brent Leback (NVIDIA/PGI)

ENABLING THE ELECTRONIC STRUCTURE PROGRAM GAUSSIANON GPGPUS USING OPENACC

Page 18: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

18

PREVIOUSLYEarlier Presentations

GRC Poster 2012

ACS Spring 2014

GTC Spring 2014 ( recording at http://on-demand.gputechconf.com/gtc/2014/video/S4613-enabling-gaussian-09-gpgpus.mp4 )

WATOC Fall 2014

GTC Spring 2016 (this full recording at http://mygtc.gputechconf.com/quicklink/4r13O5r; requires registration)

Page 19: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

19

TOPICS

Gaussian: Design Guidelines, Parallelism and Memory Model

Implementation: Top-Down/Bottom-Up

OpenACC: Extensions, Hints & Tricks

Early Performance

Closing Remarks

Page 20: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

20

GAUSSIAN

A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling

Gaussian 09 is licensed for a wide variety of computer systems

All versions of Gaussian 09 contain virtually every scientific/modeling feature, and none imposes any artificial limitations on calculations other than computational resources and time constraints

Researchers use Gaussian to, among others, study molecules and reactions; predict and interpret spectra; explore thermochemistry, photochemistry and other excited states; include solvent effects, and many more

2/7/2017

Page 21: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

21

DESIGN GUIDELINES

General

Establish a Framework for the GPU-enabling of Gaussian

Code Maintainability (Code Unification)

Leverage Existing code/algorithms, including Parallelism and Memory Model

Simplifies Resolving Problems

Simplifies Improvement on existing code

Simplifies Adding New Code

2/7/2017

Page 22: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

22

DESIGN GUIDELINES

Accelerate Gaussian for Relevant and Appropriate Theories and Methods

Relevant: many users of Gaussian

Appropriate: time consuming and good mapping to GPUs

Resource Utilization

Ensure efficient use of all available Computational Resources

CPU cores and memory

Available GPUs and memory

2/7/2017

Page 23: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

23

CURRENT STATUSSingle Node

Implemented

Energies for Closed and Open Shell HF and DFT (less than a handful of XC-functionals missing)

First derivatives for the same as above

Second derivatives for the same as above

Using only

OpenACC

CUDA library calls (BLAS)

2/7/2017

Page 24: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

24

IMPLEMENTATION MODELApplication Code

+

GPU CPUSmall Fraction of the Code

Large Fraction of Execution

time

Compute-Intensive Functions

Rest of SequentialCPU Code

Page 25: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

25

GAUSSIAN PARALLELISM MODEL

CPU Cluster

OpenMP

CPU Node

GPU

OpenACC

Page 26: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

26

GAUSSIAN: MEMORY MODEL

CPU Cluster

OpenMP

CPU Node

GPU

OpenACC

Page 27: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

27

CLOSING REMARKS

Significant Progress has been made in enabling Gaussian on GPUs with OpenACC

OpenACC is increasingly becoming more versatile

Significant work lies ahead to improve performance

Expand feature set:

PBC, Solvation, MP2, ONIOM, triples-Corrections

Page 28: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

28

ACKNOWLEDGEMENTS

Development is taking place with:

Hewlett-Packard (HP) Series SL2500 Servers (Intel® Xeon® E5-2680 v2 (2.8GHz/10-core/25MB/8.0GT-s QPI/115W, DDR3-1866)

NVIDIA® Tesla® GPUs (K40 and later)

PGI Accelerator Compilers (16.x) with OpenACC (2.5 standard)

2/7/2017

Page 29: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

GPAW

Page 30: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Increase Performance with Kepler

Running GPAW 10258

The blue nodes contain 1x E5-2687W CPU (8

Cores per CPU).

The green nodes contain 1x E5-2687W CPU (8

Cores per CPU) and 1x or 2x NVIDIA K20X for

the GPU.

0

0.5

1

1.5

2

2.5

3

3.5

Silicon K=1 Silicon K=2 Silicon K=3

Sp

eed

up

Co

mp

are

d t

o C

PU

On

ly

1.4

2.5

1.5

2.7

1.6

3.0

1 1 1

Page 31: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Increase Performance with Kepler

0

0.5

1

1.5

2

2.5

3

Silicon K=1 Silicon K=2 Silicon K=3

Sp

eed

up

Co

mp

are

d t

o C

PU

On

ly

1.7x

2.2x

2.4x

Running GPAW 10258

The blue nodes contain 1x E5-2687W CPU (8

Cores per CPU).

The green nodes contain 1x E5-2687W CPUs (8

Cores per CPU) and 2x NVIDIA K20 or K20X for

the GPU.

Page 32: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Increase Performance with Kepler

Running GPAW 10258

The blue nodes contain 2x E5-2687W CPUs (8

Cores per CPU).

The green nodes contain 2x E5-2687W CPUs (8

Cores per CPU) and 2x NVIDIA K20 or K20X for

the GPU.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Silicon K=1 Silicon K=2 Silicon K=3

Sp

eed

up

Co

mp

are

d t

o C

PU

On

ly

1.3x

1.4x

1.4x

Page 33: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Used with

permission from

Samuli Hakala

Page 34: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K
Page 35: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K
Page 36: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K
Page 37: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

37

Page 38: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

38

Page 39: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

39

Page 40: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

40

Page 41: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

41

Page 42: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

42

Page 43: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

NWChem

Page 44: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

NWChem 6.3 Release with GPU Acceleration

Addresses large complex and challenging molecular-scale scientific

problems in the areas of catalysis, materials, geochemistry and

biochemistry on highly scalable, parallel computing platforms to

obtain the fastest time-to-solution

Researchers can for the first time be able to perform large scale

coupled cluster with perturbative triples calculations utilizing the

NVIDIA GPU technology. A highly scalable multi-reference coupled

cluster capability will also be available in NWChem 6.3.

The software, released under the Educational Community License

2.0, can be downloaded from the NWChem website at

www.nwchem-sw.org

Page 45: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

System: cluster consisting

of dual-socket nodes

constructed from:

• 8-core AMD Interlagos

processors

• 64 GB of memory

• Tesla M2090 (Fermi)

GPUs

The nodes are connected

using a high-performance

QDR Infiniband interconnect

Courtesy of Kowolski, K.,

Bhaskaran-Nair, at al @

PNNL, JCTC (submitted)

NWChem - Speedup of the non-iterative calculation for various configurations/tile sizes

Page 46: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Kepler, Faster Performance (NWChem)

0

20

40

60

80

100

120

140

160

180

CPU Only CPU + 1x K20X CPU + 2x K20X

Tim

e t

o S

olu

tio

n (

sec

on

ds

)

165

81

54

Uracil

Uracil Molecule

Performance improves by 2x with one GPU and by 3.1x with 2 GPUs

Page 47: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

December 2016

Quantum Espresso 5.4.0

Page 48: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

48

AUSURF112 on K80s

Running Quantum Espresso version 5.4.0

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs + Tesla K80 (autoboost) GPUs

606.00

528.20

480

500

520

540

560

580

600

620

1 Broadwell node 1 node +4x K80 per node

seconds

AUSURF112*Lower is better

1.1X

Page 49: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

49

AUSURF112 on P100s PCIe

606.00

515.70

486.90

0

100

200

300

400

500

600

700

1 Broadwell node 1 node +4x P100 PCIe per node

1 node +8x P100 PCIe per node

seconds

AUSURF1120

Running Quantum Espresso version 5.4.0

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs + Tesla P100 PCIe GPUs

*Lower is better

1.2X 1.2X

Page 50: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Speaker, Date

TeraChem 1.5K

Page 51: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

51

TERACHEM 1.5K; TRIPCAGE ON TESLA K40S

0

40

80

120

160

200

2 x Xeon E5-2697 [email protected] + 1 xTesla K40@875Mhz (1 node)

2 x Xeon E5-2697 [email protected] + 2 xTesla K40@875Mhz (1 node)

2 x Xeon E5-2697 [email protected] + 4 xTesla K40@875Mhz (1 node)

2 x Xeon E5-2697 [email protected] + 8 xTesla K40@875Mhz (1 node)

TeraChem 1.5K; TripCage on Tesla K40s & IVB CPUs(Total Processing Time in Seconds)

Page 52: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

52

TERACHEM 1.5K; TRIPCAGE ON TESLA K40S & HASWELL CPUS

0

40

80

120

160

200

2 x Xeon E5-2698 [email protected] + 1 x TeslaK40@875Mhz (1 node)

2 x Xeon E5-2698 [email protected] + 2 x TeslaK40@875Mhz (1 node)

2 x Xeon E5-2698 [email protected] + 4 x TeslaK40@875Mhz (1 node)

TeraChem 1.5K; TripCage on Tesla K40s & Haswell CPUs(Total Processing Time in Seconds)

Page 53: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

53

TERACHEM 1.5K; TRIPCAGE ON TESLA K80S & IVB CPUS

0

40

80

120

2 x Xeon E5-2697 [email protected] + 1 x Tesla K80 board(1 node)

2 x Xeon E5-2697 [email protected] + 2 x Tesla K80 boards(1 node)

2 x Xeon E5-2697 [email protected] + 4 x Tesla K80 boards(1 node)

TeraChem 1.5K; TripCage on Tesla K80s & IVB CPUs(Total Processing Time in Seconds)

Page 54: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

54

TERACHEM 1.5K; TRIPCAGE ON TESLA K80S & HASWELL CPUS

0

30

60

90

120

2 x Xeon E5-2698 [email protected] + 1 x Tesla K80board (1 node)

2 x Xeon E5-2698 [email protected] + 2 x Tesla K80boards (1 node)

2 x Xeon E5-2698 [email protected] + 4 x Tesla K80boards (1 node)

TeraChem 1.5K; TripCage on Tesla K80s & Haswell CPUs(Total Processing Time in Seconds)

Page 55: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

55

TERACHEM 1.5K; BPTI ON TESLA K40S & IVB CPUS

0

2000

4000

6000

8000

10000

12000

2 x Xeon E5-2697 [email protected] + 1 xTesla K40@875Mhz (1 node)

2 x Xeon E5-2697 [email protected] + 2 xTesla K40@875Mhz (1 node)

2 x Xeon E5-2697 [email protected] + 4 xTesla K40@875Mhz (1 node)

2 x Xeon E5-2697 [email protected] + 8 xTesla K40@875Mhz (1 node)

TeraChem 1.5K; BPTI on Tesla K40s & IVB CPUs(Total Processing Time in Seconds)

Page 56: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

56

TERACHEM 1.5K; BPTI ON TESLA K80S & IVB CPUS

0

2000

4000

6000

8000

2 x Xeon E5-2697 [email protected] + 1 x Tesla K80board (1 node)

2 x Xeon E5-2697 [email protected] + 2 x Tesla K80boards (1 node)

2 x Xeon E5-2697 [email protected] + 4 x Tesla K80boards (1 node)

TeraChem 1.5K; BPTI on Tesla K80s & IVB CPUs(Total Processing Time in Seconds)

Page 57: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

57

TERACHEM 1.5K; BPTI ON TESLA K40S & HASWELL CPUS

0

2000

4000

6000

8000

10000

12000

2 x Xeon E5-2698 [email protected] + 1 x TeslaK40@875Mhz (1 node)

2 x Xeon E5-2698 [email protected] + 2 x TeslaK40@875Mhz (1 node)

2 x Xeon E5-2698 [email protected] + 4 x TeslaK40@875Mhz (1 node)

TeraChem 1.5K; BPTI on Tesla K40s & Haswell CPUs(Total Processing Time in Seconds)

Page 58: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

58

TERACHEM 1.5K; BPTI ON TESLA K80S & HASWELL CPUS

0

2000

4000

6000

2 x Xeon E5-2698 [email protected] + 1 x Tesla K80 board(1 node)

2 x Xeon E5-2698 [email protected] + 2 x Tesla K80boards (1 node)

2 x Xeon E5-2698 [email protected] + 4 x Tesla K80boards (1 node)

TeraChem 1.5K; BPTI on Tesla K80s & Haswell CPUs(Total Processing Time in Seconds)

Page 59: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

TeraChemSupercomputer Speeds on GPUs

0

10

20

30

40

50

60

70

80

90

100

4096 Quad Core CPUs ($19,000,000) 8 C2050 ($31,000)

Tim

e (

Seco

nd

s)

Time for SCF Step

TeraChem running on 8 C2050s on 1 node

NWChem running on 4096 Quad Core CPUs

In the Chinook Supercomputer

Giant Fullerene C240 Molecule

Similar performance from just a handful of GPUs

Page 60: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

TeraChemBang for the Buck

1

493

0

100

200

300

400

500

600

4096 Quad Core CPUs ($19,000,000) 8 C2050 ($31,000)

Pri

ce/P

erf

orm

an

ce r

ela

tiv

e t

o S

up

erc

om

pu

ter

Performance/Price

Dollars spent on GPUs do 500x more science than those spent on CPUs

TeraChem running on 8 C2050s on 1 node

NWChem running on 4096 Quad Core

CPUs

In the Chinook Supercomputer

Giant Fullerene C240 Molecule

Note: Typical CPU and GPU node pricing

used. Pricing may vary depending on node

configuration. Contact your preferred HW

vendor for actual pricing.

Page 61: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Kepler’s Even Better

Kepler performs 2x faster than Tesla

TeraChem running on C2050 and K20C

First graph is of BLYP/G-31(d)

Second is B3LYP/6-31G(d)

0

100

200

300

400

500

600

700

800

C2050 K20C

Seco

nd

s

Olestra BLYP 453 Atoms

0

200

400

600

800

1000

1200

1400

1600

1800

2000

C2050 K20C

Seco

nd

s

B3LYP/6-31G(d)

Page 62: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

February 2017

VASP 5.4.1

Page 63: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

63

Interface on K80s

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

Interface between a platinum slab Pt(111) (108 atoms) and liquid water (120 water

molecules) (468 ions)

1256 bands762048 plane waves

ALGO = Fast (Davidson + RMM-DIIS)

0.00171 0.00173

0.00238

0.00317

0.0000

0.0010

0.0020

0.0030

0.0040

0.0050

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

Interface

1.0X1.4X

1.9X

Page 64: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

64

Interface on P100s PCIe

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe GPUs

1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

Interface between a platinum slab Pt(111) (108 atoms) and liquid water (120 water

molecules) (468 ions)

1256 bands762048 plane waves

ALGO = Fast (Davidson + RMM-DIIS)

0.00171

0.00228

0.00308

0.00359

0.00434

0.00000

0.00050

0.00100

0.00150

0.00200

0.00250

0.00300

0.00350

0.00400

0.00450

0.00500

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1 node +8x P100 PCIe

per node

1/se

conds

Interface

1.3X

1.8X2.1X

2.5X

Page 65: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

65

Interface on P100s SXM2

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

Interface between a platinum slab Pt(111) (108 atoms) and liquid water (120 water

molecules) (468 ions)

1256 bands762048 plane waves

ALGO = Fast (Davidson + RMM-DIIS)

0.00171

0.00228

0.00270

0.00326

0.00462

0.00000

0.00050

0.00100

0.00150

0.00200

0.00250

0.00300

0.00350

0.00400

0.00450

0.00500

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

conds

Interface

1.3X1.6X

1.9X

2.7X

Page 66: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

66

Silica IFPEN on K80s

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

240 ions, cristobalite (high) bulk720 bands

? plane wavesALGO = Very Fast (RMM-DIIS)

0.00273 0.00276

0.00403

0.00481

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

Silica IFPEN

1.0X

1.5X

1.8X

Page 67: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

67

Silica IFPEN on P100s PCIe

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe GPUs

1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

240 ions, cristobalite (high) bulk720 bands

? plane wavesALGO = Very Fast (RMM-DIIS)

0.00273

0.00380

0.00474

0.00616

0.00674

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

0.00700

0.00800

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1 node +8x P100 PCIe

per node

1/se

conds

Silica IFPEN

1.4X

1.7X

2.3X

2.5X

Page 68: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

68

Silica IFPEN on P100s SXM2

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

240 ions, cristobalite (high) bulk720 bands

? plane wavesALGO = Very Fast (RMM-DIIS)

0.00273

0.00352

0.00475

0.00616

0.00692

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

0.00700

0.00800

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

conds

Silica IFPEN

1.3X

1.7X

2.3X2.5X

Page 69: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

69

Si-Huge on K80s

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

512 Si atoms1282 bands

864000 Plane WavesAlgo = Normal (blocked Davidson)

0.00019

0.00024

0.00032

0.00047

0.00000

0.00005

0.00010

0.00015

0.00020

0.00025

0.00030

0.00035

0.00040

0.00045

0.00050

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

Si-Huge

Page 70: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

70

Si-Huge on P100s PCIe

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe GPUs

1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

512 Si atoms1282 bands

864000 Plane WavesAlgo = Normal (blocked Davidson)

0.00019

0.00034

0.00044

0.00058

0.00074

0.00000

0.00010

0.00020

0.00030

0.00040

0.00050

0.00060

0.00070

0.00080

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1 node +8x P100 PCIe

per node

1/se

conds

Si-Huge

1.8X

2.3X

3.1X

3.9X

Page 71: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

71

Si-Huge on P100s SXM2

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

512 Si atoms1282 bands

864000 Plane WavesAlgo = Normal (blocked Davidson)

0.00019

0.00033

0.00040

0.00045

0.00066

0.00000

0.00010

0.00020

0.00030

0.00040

0.00050

0.00060

0.00070

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

conds

Si-Huge

1.7X

2.1X2.4X

3.5X

Page 72: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

72

SupportedSystems on K80s

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

267 ions788 bands

762048 plane wavesALGO = Fast (Davidson + RMM-DIIS)

0.00413 0.00414

0.00519

0.00599

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

0.00700

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

SupportedSystems

1.0X

1.3X

1.5X

Page 73: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

73

SupportedSystems on P100s PCIe

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe GPUs

1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

267 ions788 bands

762048 plane wavesALGO = Fast (Davidson + RMM-DIIS)

0.00413

0.00518

0.00651

0.00794 0.00796

0.00000

0.00200

0.00400

0.00600

0.00800

0.01000

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1 node +8x P100 PCIe

per node

1/se

conds

SupportedSystems

1.3X

1.6X

1.9X 1.9X

Page 74: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

74

SupportedSystems on P100s SXM2

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

267 ions788 bands

762048 plane wavesALGO = Fast (Davidson + RMM-DIIS)

0.00413

0.00516

0.00570

0.00692

0.00938

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

0.00700

0.00800

0.00900

0.01000

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

conds

SupportedSystems

1.2X1.4X

1.7X

2.3X

Page 75: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

75

NiAl-MD on K80s

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

500 ions3200 bands

729000 plane wavesALGO = Fast (Davidson + RMM-DIIS)

0.003470.00359

0.00537

0.00614

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

0.00700

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

NiAl-MD

1.0X

1.5X

1.8X

Page 76: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

76

NiAl-MD on P100s PCIe

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe GPUs

1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

500 ions3200 bands

729000 plane wavesALGO = Fast (Davidson + RMM-DIIS)

0.00347

0.00577

0.00731

0.009020.00936

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

0.00700

0.00800

0.00900

0.01000

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1 node +8x P100 PCIe

per node

1/se

conds

NiAl-MD

1.7X

2.1X2.6X

2.7X

Page 77: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

77

NiAl-MD on P100s SXM2

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

500 ions3200 bands

729000 plane wavesALGO = Fast (Davidson + RMM-DIIS)

0.0035

0.0057

0.0074

0.0081

0.0090

0.0000

0.0010

0.0020

0.0030

0.0040

0.0050

0.0060

0.0070

0.0080

0.0090

0.0100

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

conds

NiAl-MD

1.6X

2.1X2.3X

2.6X

Page 78: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

78

LiZnO on P100s PCIe

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe GPUs

500 ions3200 bands

729000 plane wavesALGO = Fast (Davidson + RMM-DIIS)

0.00106

0.00137

0.00153

0.00000

0.00020

0.00040

0.00060

0.00080

0.00100

0.00120

0.00140

0.00160

0.00180

1 Broadwell node 1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1/se

conds

LiZnO

1.3X1.4X

Page 79: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

79

LiZnO on P100s SXM2

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

500 ions3200 bands

729000 plane wavesALGO = Fast (Davidson + RMM-DIIS)

0.00110.0011

0.0013

0.0015

0.0018

0.0000

0.0002

0.0004

0.0006

0.0008

0.0010

0.0012

0.0014

0.0016

0.0018

0.0020

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1 node +8x P100 PCIe

per node

1/se

conds

LiZnO

1.0X1.2X

1.4X1.6X

Page 80: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

80

B.hR105 on P100s PCIe

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe GPUs

1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

105 Boron atoms (β-rhombohedral structure)216 bands

110592 plane wavesHybrid Functional with blocked Davicson

(ALGO=Normal)LHFCALC=.True. (Exact Exchange)

0.00090

0.00223

0.00371

0.00560

0.00702

0.00000

0.00100

0.00200

0.00300

0.00400

0.00500

0.00600

0.00700

0.00800

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1 node +8x P100 PCIe

per node

1/se

conds

B.hR105

2.5X

4.1X

6.2X

7.8X

Page 81: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

81

B.hR105 on P100s SXM2

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

105 Boron atoms (β-rhombohedral structure)216 bands

110592 plane wavesHybrid Functional with blocked Davicson

(ALGO=Normal)LHFCALC=.True. (Exact Exchange)

0.0009

0.0024

0.0039

0.0059

0.0078

0.0000

0.0010

0.0020

0.0030

0.0040

0.0050

0.0060

0.0070

0.0080

0.0090

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

cpnds

B.hR105

2.7X

4.3X

6.6X

8.7X

Page 82: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

82

B.aP107 on P100s PCIe

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe GPUs

1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

107 Boron atoms (symmetry broken 107-atom β′ variant)

216 bands110592 plane waves

Hybrid functional calculation (exact exchange) with blocked Davidson. No KPoint parallelization.

Hybrid Functional with blocked Davidson (ALGO=Normal)

LHFCALC=.True. (Exact Exchange)

0.00003

0.00012

0.00021

0.00031

0.00041

0.00000

0.00005

0.00010

0.00015

0.00020

0.00025

0.00030

0.00035

0.00040

0.00045

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

1 node +8x P100 PCIe

per node

1/se

conds

B.aP107

4.0X

7.0X

10.3X

13.7X

Page 83: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

83

B.aP107 on P100s SXM2

Running VASP version 5.4.1

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

107 Boron atoms (symmetry broken 107-atom β′ variant)

216 bands110592 plane waves

Hybrid functional calculation (exact exchange) with blocked Davidson. No KPoint parallelization.

Hybrid Functional with blocked Davidson (ALGO=Normal)

LHFCALC=.True. (Exact Exchange)

0.00003

0.00011

0.00020

0.00027

0.00044

0.00000

0.00005

0.00010

0.00015

0.00020

0.00025

0.00030

0.00035

0.00040

0.00045

0.00050

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

conds

B.aP107

3.7X

6.7X9.0X

14.7X

Page 84: Quantum Chemistry (QC) on GPUsimages.nvidia.com/content/tesla/pdf/Quantum-Chemistry-Feb-2017-MB-slides.pdf · GPU-Accelerated Quantum Chemistry Apps Abinit ACES III ADF BigDFT CP2K

Dec, 19, 2016

Quantum Chemistry (QC) on GPUs