60
CL 2 QCD Developers Alessandro Sciarra Christopher Czaban Francesca Cuteri Past Developers Matthias Bach Christopher Pinke Frederik Depta Christian Schäfer Lars Zeidlewicz Open CL LQCD https://github.com/CL2QCD/cl2qcd ZEUTHEN FRANKFURT Alessandro Sciarra Lattice Seminar 20.06.2016 DESY Lattice group of Owe Philipsen

Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

CL2QCD

Develope

rs

♂ Alessa

ndro Sci

arra

♂ Christ

opher

Czaban

♀ France

scaCut

eri

Past Developers♂ Matthias Bach♂ Christopher Pinke

♂ Frederik Depta♂ Christian Schäfer

♂ Lars Zeidlewicz

Open CL LQCD

https://github.com/CL2QCD/cl2qcd

ZEUTHEN

FRANKFURT

Alessandro SciarraLattice Seminar20.06.2016

DESY

Lattice group of Owe Philipsen

Page 2: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Our physics motivation (I)

QCD THERMODYNAMICS

N3s ˆ Nt lattice Ñ T “

1a Nt

Ñ 1 ! Nt ! Ns

Second order scenario First order scenario

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 2 / 21

Page 3: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Our physics motivation (II)

mu,d ms

0

µT

2

RW Z2

O(4)

B

A

1st tr. 1st tr. ∞

)(

[M. D’Elia, O. Philipsen et al., Phys.Rev. D90 (2014)]

In the volume below theplane µ “ 0 simulationsare safe

The A point in theRoberge–Weiss plane isat a value of the mass thatcan be simulated

The position of the pointB determines the type oftransition at the Nf “ 2chiral point at µ “ 0

The idea is to study the line connecting A and B in the ms “ 8 plane

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 3 / 21

Page 4: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Why do we need a fast code?

Just to give an idea of how costly it is... We keep fixed: Nt µ

for Nt in ...; do # ~3 values

for mu in ...; do # ~6 values

for mass in ...; do # ~6 values

for Ns in ...; do # ~3 values >= 3*Nt

for T in ...; do # ~5 valuesecho "Run the (R)HMC for >50k trajectories"# ...

done

done

done

done # Consider that the typical time of adone # simulation varies from weeks to months

done

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 4 / 21

Page 5: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Why do we need a fast code?

Just to give an idea of how costly it is... We keep fixed: Nt

µ

for Nt in ...; do # ~3 values

for mu in ...; do # ~6 values

for mass in ...; do # ~6 values

for Ns in ...; do # ~3 values >= 3*Nt

for T in ...; do # ~5 valuesecho "Run the (R)HMC for >50k trajectories"# ...

done

done

donedone

done # Consider that the typical time of adone # simulation varies from weeks to months

done

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 4 / 21

Page 6: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Why do we need a fast code?

Just to give an idea of how costly it is...

We keep fixed: Nt

µ

for Nt in ...; do # ~3 values

for mu in ...; do # ~6 values

for mass in ...; do # ~6 values

for Ns in ...; do # ~3 values >= 3*Nt

for T in ...; do # ~5 valuesecho "Run the (R)HMC for >50k trajectories"# ...

done

done

done # Consider that the typical time of adone # simulation varies from weeks to months

done

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 4 / 21

Page 7: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Why do we need a fast code?

Just to give an idea of how costly it is...

We keep fixed: Nt

µ

for Nt in ...; do # ~3 values

for mu in ...; do # ~6 values

for mass in ...; do # ~6 values

for Ns in ...; do # ~3 values >= 3*Nt

for T in ...; do # ~5 valuesecho "Run the (R)HMC for >50k trajectories"# ...

done

done

done # Consider that the typical time of adone # simulation varies from weeks to months

done

Considering 3 months as average time of a single simulation. . .

For loop in order as above Ñ 405 years

Inner three for loop parallel Ñ 4.5 yearsˆ

(roughly 1.5)

STRETCH FACTOR(feedback, technical)

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 4 / 21

Page 8: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Should a code just work?

NO

Any code in principle should be:

ReadableMaintainableEasy to extendEasy to useHard to breakTestable

CL2QCD

Lattice QCD with Open CL

Clean Code 2 Limit Questions and Doubts

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 5 / 21

Page 9: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Should a code just work? NO

Any code in principle should be:

ReadableMaintainableEasy to extendEasy to useHard to breakTestable

CL2QCD

Lattice QCD with Open CL

Clean Code 2 Limit Questions and Doubts

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 5 / 21

Page 10: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Should a code just work? NO

Any code in principle should be:

ReadableMaintainableEasy to extendEasy to useHard to breakTestable

CL2QCD

Lattice QCD with Open CL

Clean Code 2 Limit Questions and Doubts

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 5 / 21

Page 11: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Should a code just work? NO

Any code in principle should be:

ReadableMaintainableEasy to extendEasy to useHard to breakTestable

CL2QCD

Lattice QCD with Open CL

Clean Code 2 Limit Questions and Doubts

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 5 / 21

Page 12: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Should a code just work? NO

Any code in principle should be:

ReadableMaintainableEasy to extendEasy to useHard to breakTestable

CL2QCD

Lattice QCD with Open CL

Clean Code 2 Limit Questions and Doubts

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 5 / 21

Page 13: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

Motivation and Philosophy

Should a code just work? NO

Any code in principle should be:

ReadableMaintainableEasy to extendEasy to useHard to breakTestable

CL2QCD

Lattice QCD with Open CL

Clean Code 2 Limit Questions and Doubts

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 5 / 21

Page 14: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 6 / 21

Page 15: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 6 / 21

Page 16: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The code from the user point of view (I)

Features implemented in CL2QCD

Different fermion formulationsX Wilson fermions in standard formulationX Staggered fermions in rooted standard formulationX Twisted Mass Wilson fermions

Improved gauge actions (for Monte Carlo simulations)X Tree-level SymanzikX IwasakiX Doubly Blocked Wilson

Pure gauge simulations (Heatbath and Monte Carlo)Standard inversion and integration algorithmsILDG-compatible input/output (using LIME)RANLUX pseudo-random number generatorEven-odd preconditioningHasenbuch’s trick for Wilson fermions

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 7 / 21

Page 17: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The code from the user point of view (II)

Inst

alla

tion

proc

edur

e

# Clone the git repositorycd <CL2QCD_INSTALL_DIR>git clone https://github.com/CL2QCD/cl2qcd.git# Make a build directorycd <CL2QCD_INSTALL_DIR>/cl2qcdmkdir buildcd build/## Run cmake; if not found automatically,# cmake variables can be set via command line as# "-D <CMAKE_VARIABLE>=<VALUE>"#cmake ..# Build all executablesmake -j

The compiler must be capable of basic C++11 features

Req

uire

dlib

rari

es OpenCL Ñ http://www.khronos.org/opencl/

LIME Ñ http://usqcd.jlab.org/usqcd-docs/c-lime/

libxml2 Ñ http://xmlsoft.org/

Boost Ñ http://www.boost.org/

GMP Ñ http://gmplib.org/

MPFR Ñ http://www.mpfr.org/

Nettle Ñ http://www.lysator.liu.se/~nisse/nettle/

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 8 / 21

Page 18: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The code from the user point of view (III)

EXECUTABLES HMC

RHMC

SUp3q

Heatbath

Inverter

Gaugeobservables

Only with (Twisted Mass) Wilson fermion.

Only with standard staggered fermion.

Possibility of doing overrelaxation.

Measurement of fermionic observables.

Measurement of gauge observables.

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 9 / 21

Page 19: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Executable Options

Each executable has its helper Ñ ./<executable> -h

Each option has a default value, used if not specifiedIn the helpers there are options not relevant for that algorithmSome options are used for profiling/benchmarks only

...HMC options:

--tau arg (=0.5)--reversibility_check arg (=0)--integrationsteps0 arg (=10)--integrationsteps1 arg (=10)--integrationsteps2 arg (=10)--hmcsteps arg (=10)--num_timescales arg (=1)--integrator0 arg (=leapfrog)--integrator1 arg (=leapfrog)--integrator2 arg (=leapfrog)--lambda0 arg (=0.19318332750378359)--lambda1 arg (=0.19318332750378359)--lambda2 arg (=0.19318332750378359)--use_gauge_only arg (=0)--use_mp arg (=0)...

Possible input file:

...fermact=rooted_staggnum_tastes=2solver=cgcgmax=8000measure_correlators=0cg_iteration_block_size=50num_timescales=2tau=1integrator0=twomnintegrator1=twomnintegrationsteps0=3integrationsteps1=6nspace=24ntime=6mass=0.6000rhmcsteps=100000savefrequency=500savepointfrequency=200startcondition=hot...

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 10 / 21

Page 20: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The output levels of the code

Einhard c©2010 Matthias BachOFF

FATAL

ERROR

WARN

INFO

DEBUG

TRACE

[01:05:26] FATAL:[22:13:59] ERROR:[15:42:13] WARNING:[10:00:12] INFO:[03:35:17] DEBUG:[08:52:48] TRACE:

The undesired output is removedby pre-processor operations

NO OVERHEAD

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 11 / 21

Page 21: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The output levels of the code

Einhard c©2010 Matthias BachOFF

FATAL

ERROR

WARN

INFO

DEBUG

TRACE

[01:05:26] FATAL:[22:13:59] ERROR:[15:42:13] WARNING:[10:00:12] INFO:[03:35:17] DEBUG:[08:52:48] TRACE:

The undesired output is removedby pre-processor operations

NO OVERHEAD

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 11 / 21

Page 22: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The output levels of the code

Einhard c©2010 Matthias BachOFF

FATAL

ERROR

WARN

INFO

DEBUG

TRACE

[01:05:26] FATAL:[22:13:59] ERROR:[15:42:13] WARNING:[10:00:12] INFO:[03:35:17] DEBUG:[08:52:48] TRACE:

The undesired output is removedby pre-processor operations

NO OVERHEAD

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 11 / 21

Page 23: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 11 / 21

Page 24: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The code structure

M. Bach, O. Philipsen, C. Pinke, and A. Sciarra «CL2QCD– Lattice QCD based on OpenCL»http://arxiv.org/pdf/1411.5219.pdf

M. Bach, «Energy- and Cost-Efficient Lattice-QCD Computations using Graphics Processing Units»http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/37074

META

Parameters

Utilities

ILDG IO

PHYSICS

AlgorithmsImplementation of algorithms using

Lattices and Fermionmatrix objects

• HMC

• RHMC

• HEATBATH

• SOLVER

• INTEGRATOR

• METROPOLIS

LatticesRepresentations of lattice fields

Operations on fields

• GAUGEFIELD

• FERMIONFIELD

• GAUGEMOMENTA

FermionmatrixMatrix operations on fermion fields

• WILSON

• TWISTED MASS

• STAGGERED

Observables• GAUGEOBSERVABLES

• 〈ψ̄ψ〉• CORRELATORS

PRNG• RANLUX

Noise Sources• POINT • Z2

• GAUSSIAN • Z4

HARDWARE

BuffersRepresentation of OpenCL buffer

Device-dependent: AOS or SOA

• SU3

• SU3VEC

• SPINOR

• GAUGEMOMENTA• · · ·

LatticesRepresentation of fields buffer

Host-devices communications

• GAUGEFIELD

• FERMIONFIELD

• GAUGEMOMENTA

CodeExecution of specific OpenCL kernels

Meta information about kernels

• SPINOR ALGEBRA

• FERMIONS

• MOLECULAR DYNAMICS

• GAUGEFIELD• · · ·

OpenCL CompilerCompilation of OpenCL kernels

Reread functionality

DeviceRepresentation of specific device

Provides Code modules

SystemRepr. of current architecture

Provides available OpenCL devices

OpenCL KernelsCode for execution on device

Compiled at runtime

• PLAQUETTE

• SAXPY

• DSLASH

• GAMMA5• · · ·

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 12 / 21

Page 25: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The code structure

CL2QCD

HARDWARE

System

Device

Code

Open CLcompiler

Open CLBuffers

LatticeBuffers

OPEN CLKERNELS

META

Parameters

ILDG I/O

Utilities

PHYSICS

Latticefields

Fermionmatrices

Algorithms

PRNG

NoiseSources

Observable

C++

Open CL

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 12 / 21

Page 26: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The code structure

CL2QCD

HARDWARE

System

Device

Code

Open CLcompiler

Open CLBuffers

LatticeBuffers

OPEN CLKERNELS

META

Parameters

ILDG I/O

Utilities

PHYSICS

Latticefields

Fermionmatrices

Algorithms

PRNG

NoiseSources

Observable

C++

Open CL

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 12 / 21

Page 27: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The code structure

CL2QCD

HARDWARE

System

Device

Code

Open CLcompiler

Open CLBuffers

LatticeBuffers

OPEN CLKERNELS

META

Parameters

ILDG I/O

Utilities

PHYSICS

Latticefields

Fermionmatrices

Algorithms

PRNG

NoiseSources

Observable

C++

Open CL

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 12 / 21

Page 28: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 12 / 21

Page 29: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Physics package

PHYSICS LATTICEFIELDS

ψ-fields

Uµ-fields

Hµ-fields

FERMIONMATRICES

Wilson

TwistedMass

Staggered

GAUGEOBS.

PlaquettePolyakovLoop

ChiralCondensate

ALGS

Integrator

Leapfrog

2MN

Fermionforce

SolverCG

CG-M

BiCGstabPRNG NOISE

SOURCES

Point

SlicesVolumeTESTS

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 13 / 21

Page 30: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Physics package

PHYSICS LATTICEFIELDS

ψ-fields

Uµ-fields

Hµ-fields

FERMIONMATRICES

Wilson

TwistedMass

Staggered

GAUGEOBS.

PlaquettePolyakovLoop

ChiralCondensate

ALGS

Integrator

Leapfrog

2MN

Fermionforce

SolverCG

CG-M

BiCGstabPRNG NOISE

SOURCES

Point

SlicesVolumeTESTS

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 13 / 21

Page 31: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Physics package

PHYSICS LATTICEFIELDS

ψ-fields

Uµ-fields

Hµ-fields

FERMIONMATRICES

Wilson

TwistedMass

Staggered

GAUGEOBS.

PlaquettePolyakovLoop

ChiralCondensate

ALGS

Integrator

Leapfrog

2MN

Fermionforce

SolverCG

CG-M

BiCGstabPRNG NOISE

SOURCES

Point

SlicesVolumeTESTS

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 13 / 21

Page 32: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Physics package

PHYSICS LATTICEFIELDS

ψ-fields

Uµ-fields

Hµ-fields

FERMIONMATRICES

Wilson

TwistedMass

Staggered

GAUGEOBS.

PlaquettePolyakovLoop

ChiralCondensate

ALGS

Integrator

Leapfrog

2MN

Fermionforce

SolverCG

CG-M

BiCGstabPRNG NOISE

SOURCES

Point

SlicesVolumeTESTS

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 13 / 21

Page 33: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 13 / 21

Page 34: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Hardware package

HARDWARE

LATTICEBUFFERS

ψ-fields

Uµ-fields

Hµ-fields

OPEN CLBUFFERS

Plain

3-vector

12-vector

8-vector3 ˆ 3matrix

PRNG

CODEPRNG

Uµ-fieldsHµ-fields

Real

Heatbath MolecularDynamics

Correlator

ψ-fields

Complex

Buffers

Kappa

SYSTEM

DEVICE

OPEN CLCOMPILERTESTS

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 14 / 21

Page 35: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Hardware package

HARDWARE

LATTICEBUFFERS

ψ-fields

Uµ-fields

Hµ-fields

OPEN CLBUFFERS

Plain

3-vector

12-vector

8-vector3 ˆ 3matrix

PRNG

CODEPRNG

Uµ-fieldsHµ-fields

Real

Heatbath MolecularDynamics

Correlator

ψ-fields

Complex

Buffers

Kappa

SYSTEM

DEVICE

OPEN CLCOMPILERTESTS

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 14 / 21

Page 36: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Hardware package

HARDWARE

LATTICEBUFFERS

ψ-fields

Uµ-fields

Hµ-fields

OPEN CLBUFFERS

Plain

3-vector

12-vector

8-vector3 ˆ 3matrix

PRNG

CODEPRNG

Uµ-fieldsHµ-fields

Real

Heatbath MolecularDynamics

Correlator

ψ-fields

Complex

Buffers

Kappa

SYSTEM

DEVICE

OPEN CLCOMPILERTESTS

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 14 / 21

Page 37: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Hardware package

HARDWARE

LATTICEBUFFERS

ψ-fields

Uµ-fields

Hµ-fields

OPEN CLBUFFERS

Plain

3-vector

12-vector

8-vector3 ˆ 3matrix

PRNG

CODEPRNG

Uµ-fieldsHµ-fields

Real

Heatbath MolecularDynamics

Correlator

ψ-fields

Complex

Buffers

Kappa

SYSTEM

DEVICE

OPEN CLCOMPILERTESTS

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 14 / 21

Page 38: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 14 / 21

Page 39: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

General Purpose Graphics Processing Unit

GPU

CARD CHIPMEMORY PEAK DP PEAK BW CLOCK

YEAR{GB} {GFLOPS} {GB/s} {MHz}

AMD Radeon HD 5870 Cypress 1 544 154 850 2009

AMD Radeon HD 7970 Tahiti 3 947 264 925 2012

AMD FirePro S10000 Tahiti Pro GL 2ˆ3 - 6 2ˆ1480 2ˆ240 825 2012

AMD FirePro S9050 Thaiti 12 806 264 900 2014

AMD FirePro S9150 Hawaii 16 2530 320 900 2014

AMD FirePro S9170 Grenada 32 2620 320 930 2015

AMD FirePro S9300 Capsaicin 2ˆ4 (HBM) 868 2ˆ512 850 2016

NVIDIA GeForce GTX 680 Kepler 2 - 4 129 192 1006 2012

NVIDIA Tesla K40 Kepler 12 1680 288 745 2013

NVIDIA Tesla K80 Kepler 2ˆ12 2912 2ˆ240 560 2014

ρ “Number of FLOPs

Number of Bytes to read and writeWilson fermions ρp {Dq „ 0.57Staggered fermions ρpDKSq „ 0.35

«FLOPS do not count.» – Clark

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 15 / 21

Page 40: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

General Purpose Graphics Processing Unit

GPU

CARD CHIPMEMORY PEAK DP PEAK BW CLOCK

YEAR{GB} {GFLOPS} {GB/s} {MHz}

AMD Radeon HD 5870 Cypress 1 544 154 850 2009

AMD Radeon HD 7970 Tahiti 3 947 264 925 2012

AMD FirePro S10000 Tahiti Pro GL 2ˆ3 - 6 2ˆ1480 2ˆ240 825 2012

AMD FirePro S9050 Thaiti 12 806 264 900 2014

AMD FirePro S9150 Hawaii 16 2530 320 900 2014

AMD FirePro S9170 Grenada 32 2620 320 930 2015

AMD FirePro S9300 Capsaicin 2ˆ4 (HBM) 868 2ˆ512 850 2016

NVIDIA GeForce GTX 680 Kepler 2 - 4 129 192 1006 2012

NVIDIA Tesla K40 Kepler 12 1680 288 745 2013

NVIDIA Tesla K80 Kepler 2ˆ12 2912 2ˆ240 560 2014

ρ “Number of FLOPs

Number of Bytes to read and writeWilson fermions ρp {Dq „ 0.57Staggered fermions ρpDKSq „ 0.35

«FLOPS do not count.» – Clark

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 15 / 21

Page 41: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

General Purpose Graphics Processing Unit

GPU

CARD CHIPMEMORY PEAK DP PEAK BW CLOCK

YEAR{GB} {GFLOPS} {GB/s} {MHz}

AMD Radeon HD 5870 Cypress 1 544 154 850 2009

AMD Radeon HD 7970 Tahiti 3 947 264 925 2012

AMD FirePro S10000 Tahiti Pro GL 2ˆ3 - 6 2ˆ1480 2ˆ240 825 2012

AMD FirePro S9050 Thaiti 12 806 264 900 2014

AMD FirePro S9150 Hawaii 16 2530 320 900 2014

AMD FirePro S9170 Grenada 32 2620 320 930 2015

AMD FirePro S9300 Capsaicin 2ˆ4 (HBM) 868 2ˆ512 850 2016

NVIDIA GeForce GTX 680 Kepler 2 - 4 129 192 1006 2012

NVIDIA Tesla K40 Kepler 12 1680 288 745 2013

NVIDIA Tesla K80 Kepler 2ˆ12 2912 2ˆ240 560 2014

ρ “Number of FLOPs

Number of Bytes to read and writeWilson fermions ρp {Dq „ 0.57Staggered fermions ρpDKSq „ 0.35

«FLOPS do not count.» – Clark

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 15 / 21

Page 42: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

General Purpose Graphics Processing Unit

GPU

CARD CHIPMEMORY PEAK DP PEAK BW CLOCK

YEAR{GB} {GFLOPS} {GB/s} {MHz}

AMD Radeon HD 5870 Cypress 1 544 154 850 2009

AMD Radeon HD 7970 Tahiti 3 947 264 925 2012

AMD FirePro S10000 Tahiti Pro GL 2ˆ3 - 6 2ˆ1480 2ˆ240 825 2012

AMD FirePro S9050 Thaiti 12 806 264 900 2014

AMD FirePro S9150 Hawaii 16 2530 320 900 2014

AMD FirePro S9170 Grenada 32 2620 320 930 2015

AMD FirePro S9300 Capsaicin 2ˆ4 (HBM) 868 2ˆ512 850 2016

NVIDIA GeForce GTX 680 Kepler 2 - 4 129 192 1006 2012

NVIDIA Tesla K40 Kepler 12 1680 288 745 2013

NVIDIA Tesla K80 Kepler 2ˆ12 2912 2ˆ240 560 2014

ρ “Number of FLOPs

Number of Bytes to read and writeWilson fermions ρp {Dq „ 0.57Staggered fermions ρpDKSq „ 0.35

«FLOPS do not count.» – Clark

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 15 / 21

Page 43: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

CPU vs GPU code

saxpyψ1

ψ2

α φ “ αψ1 ` ψ2

On CPU On GPU

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 16 / 21

Page 44: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

CPU vs GPU code

saxpyψ1

ψ2

α φ “ αψ1 ` ψ2

On CPU On GPU

void saxpy( const spinor *x,const spinor *y,const hmc_complex *alpha,spinor *out)

{

for(unsigned int index = 0; index < VTOT ; index += 1 ){

const spinor tmp = spinor_times_complex(x[index], alpha);out[index] = spinor_acc(y[index], tmp);

}}

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 16 / 21

Page 45: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

CPU vs GPU code

saxpyψ1

ψ2

α φ “ αψ1 ` ψ2

On CPU On GPU

__kernel void saxpy(__global const spinor *x,__global const spinor *y,__global const hmc_complex *alpha,__global spinor *out)

{int id = get_global_id(0);int gs = get_global_size(0);

for(unsigned int idMem = id; idMem < SPINORFIELDSIZE_MEM; idMem += gs){

const spinor tmp = spinor_times_complex(x[idMem], alpha);out[idMem] = spinor_acc(y[idMem], tmp);

}}

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 16 / 21

Page 46: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 16 / 21

Page 47: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Crucial concepts

Robert C. Martin (2009), «Clean Code»

Kent Beck (2002), «Test Driven Development: By Example»

Test each single part of code on its own

Unit tests implemented using BOOST and CMake unit test frameworks

Regression tests for the Open CL parts are absolutely mandatory

LQCD functions are local Ñ analytic results to test against can be calculated

Avoid dependence of the tests on specific environments

Testing33%Developing

33%

Refactoring25%

Optimisation8%

PRESENT

tMore Development More Refactoring

2011 2012 2013 2014 2015 2017 2018

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 17 / 21

Page 48: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Crucial concepts

Robert C. Martin (2009), «Clean Code»

Kent Beck (2002), «Test Driven Development: By Example»

Test each single part of code on its own

Unit tests implemented using BOOST and CMake unit test frameworks

Regression tests for the Open CL parts are absolutely mandatory

LQCD functions are local Ñ analytic results to test against can be calculated

Avoid dependence of the tests on specific environments

Testing33%Developing

33%

Refactoring25%

Optimisation8%

PRESENT

tMore Development More Refactoring

2011 2012 2013 2014 2015 2017 2018

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 17 / 21

Page 49: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Crucial concepts

Robert C. Martin (2009), «Clean Code»

Kent Beck (2002), «Test Driven Development: By Example»

Test each single part of code on its own

Unit tests implemented using BOOST and CMake unit test frameworks

Regression tests for the Open CL parts are absolutely mandatory

LQCD functions are local Ñ analytic results to test against can be calculated

Avoid dependence of the tests on specific environments

Testing33%Developing

33%

Refactoring25%

Optimisation8%

PRESENT

tMore Development More Refactoring

2011 2012 2013 2014 2015 2017 2018

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 17 / 21

Page 50: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 17 / 21

Page 51: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

The Dirac operator kernel

163

×8

163

×16

163

×24

163

×32

243

×12

243

×16

323

×8

243

×24

323

×12

243

×32

323

×16

243

×48

323

×24

483

×8

483

×12

80

100

200

300

//Lattice Size

GB/s

Performance of Wilson /D

AMD Radeon HD 7970 AMD Radeon HD 5870NVIDIA Tesla K40 AMD FirePro S10000AMD FirePro S9150

0

50

100

150

GFLO

PS

163

×8

163

×16

163

×24

163

×32

243

×12

243

×16

323

×8

243

×24

323

×12

243

×32

323

×16

243

×48

323

×24

483

×8

483

×12

80

100

200

300

//

Lattice Size

GB/s

Performance of Staggered DKS

AMD Radeon HD 7970 AMD Radeon HD 5870NVIDIA Tesla K40 AMD FirePro S10000AMD FirePro S9150

0

50

100

GFLO

PS

50%

75%

100%

50%

75%

100%

<BW>/BWmax

<BW>/BWmax

320 GB/s264 GB/s240 GB/s288 GB/s154 GB/s

Max Bandwith

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 18 / 21

Page 52: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

CL2QCD VS tmLQCD (in 2013)

0 1000 2000 3000 4000 5000 6000 7000

A

B

C

6459

6469

2178

3458

3410

1021

1602

1592

553

Execution Time / s

AMD FirePro S10000 (2012)AMD Radeon HD 5870 (2009)

tmlqcd on 2 AMD Opteron 6172 (2010)

SETUP mπtMeVu

A 260B 310C 520

β “ 3.9

κc “ 0.160856

Nτ “ 8

Nσ “ 24

AT MAXIMAL TWIST

Runs done on LOEWE-CSC

tmLQCD has been run on a whole node (24 cores)

Price per flop for the GPUs much lower than for the CPUs

M. Bach et al. «Twisted-Mass Lattice QCD using OpenCL»http://pos.sissa.it/archive/conferences/187/032/LATTICE%202013_032.pdf

1 GPU « 4 CPU

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 19 / 21

Page 53: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 19 / 21

Page 54: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Multi-GPU scaling

CL2QCD can use multiple GPUs within the same node

At the moment, the lattice can be divided only in the temporal direction

Tested on SANAM (AMD FirePro S10000, in 2013), 4 GPUs per node

1 2 3 40

100

200

GPUs

Solve

rPer

form

ance

/GF

LOPs 323 × 12

483 × 16323 × 64243 × 128

Hard scaling:

The total lattice size is kept constant

1 2 3 40

100

200

GPUsSo

lverP

erfo

rman

ce/

GFLO

Ps

323 × 16243 × 32

Weak scaling:

The local lattice size is kept constant

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 20 / 21

Page 55: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Outline of the talk

1 Motivation and our philosophy

2 CL2QCD features

3 Structure of the codePhysicsHardware

4 Programming on GPU

5 Unit tests, maintainability and portability

6 Performances of the codeMultiple GPUs

7 Ongoing and future work

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 20 / 21

Page 56: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Plausible future work

2016

2017

2018

‚ Arbitrary parallelization direction

Pure Wilson RHMC

Staggered pion mass

Clover term

Multiple pseudofermions

Physics analytic tests

Smearing?

Disentangle all packages + work on CMake

Parallelization in >1 directions

BaHaMA

S

Cleaner and easier to develop code

https://github.com/CL2QCD/cl2qcd

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 21 / 21

Page 57: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Plausible future work

2016

2017

2018

‚ Arbitrary parallelization direction

Pure Wilson RHMC

Staggered pion mass

Clover term

Multiple pseudofermions

Physics analytic tests

Smearing?

Disentangle all packages + work on CMake

Parallelization in >1 directions

BaHaMA

S

Cleaner and easier to develop code

https://github.com/CL2QCD/cl2qcd

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 21 / 21

Page 58: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Plausible future work

2016

2017

2018

‚ Arbitrary parallelization direction

Pure Wilson RHMC

Staggered pion mass

Clover term

Multiple pseudofermions

Physics analytic tests

Smearing?

Disentangle all packages + work on CMake

Parallelization in >1 directions

BaHaMA

S

Cleaner and easier to develop code

https://github.com/CL2QCD/cl2qcd

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 21 / 21

Page 59: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Plausible future work

2016

2017

2018

‚ Arbitrary parallelization direction

Pure Wilson RHMC

Staggered pion mass

Clover term

Multiple pseudofermions

Physics analytic tests

Smearing?

Disentangle all packages + work on CMake

Parallelization in >1 directions

BaHaMA

S

Cleaner and easier to develop code

https://github.com/CL2QCD/cl2qcd

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 21 / 21

Page 60: Francesca Cuteri Christopher Czaban ˆ ......Lattice Seminar DESY ALESSANDRO SCIARRA Motivation and Philosophy Our physics motivation (I) QCD THERMODYNAMICS N3 s N t lattice Ñ T 1

Lattice SeminarDESY

ALESSANDRO SCIARRA

CL2 QCD features

Structure of the code

Physics

Hardware

Programming on GPU

Unit tests, maintainabilityand portability

Performances of the code

Multiple GPUs

Ongoing and future work

Plausible future work

2016

2017

2018

‚ Arbitrary parallelization direction

Pure Wilson RHMC

Staggered pion mass

Clover term

Multiple pseudofermions

Physics analytic tests

Smearing?

Disentangle all packages + work on CMake

Parallelization in >1 directions

BaHaMA

S

Cleaner and easier to develop code

https://github.com/CL2QCD/cl2qcd

Alessandro Sciarra (Goethe Universität) CL2 QCD HIC for FAIR – 20.06.2016 21 / 21