27
Towards a high performance detector geometry library on CPU and GPU -- present status and future directions -- Annual concurrency forum meeting 2-4-2014 Sandro Wenzel / CERN-PH-SFT (for the GPU simulation+ Geant-Vector prototypes) building on previous talks: 5-6-13 / 9-10-13 / 29-1-14

Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Towards a high performance detector geometry library on CPU and GPU-- present status and future directions --

Annual concurrency forum meeting 2-4-2014

Sandro Wenzel / CERN-PH-SFT(for the GPU simulation+ Geant-Vector prototypes)

building on previous talks: 5-6-13 / 9-10-13 / 29-1-14

This is an output file created in Illustrator CS3

Colour reproductionThe badge version must only be reproduced on a plain white background using the correct blue: Pantone: 286 CMYK: 100 75 0 0 RGB: 56 97 170 Web: #3861AA

Where colour reproduction is not faithful, or the background is not plain white, the logo should be reproduced in black or white – whichever provides the greatest contrast. The outline version of the logo may be reproduced in another colour in instances of single-colour print.

Clear spaceA clear space must be respected around the logo: other graphical or text elements must be no closer than 25% of the logo’s width.

Placement on a documentUse of the logo at top-left or top-centre of a document is reserved for official use.

Minimum sizePrint: 10mmWeb: 60px

CERN Graphic Charter: use of the outline version of the CERN logo

Page 2: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

“Many particles” SIMD optimizations in geometryrecap of problem statementperformance numbers + ingredients how we got there

Outline

2

(by Johannes De Fine Licht)

Part I (“Status of demonstrator”)

Ideas towards a universal high-performance library for detector geometry

“SIMD everywhere” further requirementspossible solutions

Status of an “abstracted” scalar/SIMD/GPU geometry prototype library

Part II (“Beyond the demonstrator”)

Part III

new

new

Page 3: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Recap of problem statement and status of many-particle vectorization

3

Marilena Bandieramonte ( University of Catania, Italy ) Georgios Bitzes ( CERN Openlab )Laurent Duhem ( Intel )Raman Sehgal ( BARC, India )Juan Valles ( CERN summer student )

with contributions from

Page 4: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel Annual Concurrency Forum Meeting, 02-04-14

typical geometry task in particle tracking: find next hitting boundary and get distance to it

The original problem statement in pictures

1 particle

x1

d1

s

functionality provided by existing code (Geant4, ROOT,...)

Page 5: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel Annual Concurrency Forum Meeting, 02-04-14

typical geometry task in particle tracking: find next hitting boundary and get distance to it

The original problem statement in pictures

1 particle

x1

d1

s

functionality provided by existing code (Geant4, ROOT,...)

vectors of particles

x1

d1

s

x4

x2

x3

functionality targeted by future simulation approaches

aim for efficient utilization of current and future hardware

Page 6: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel Annual Concurrency Forum Meeting, 02-04-14

typical geometry task in particle tracking: find next hitting boundary and get distance to it

The original problem statement in pictures

1 particle

x1

d1

s

functionality provided by existing code (Geant4, ROOT,...)

vectors of particles

x1

d1

s

x4

x2

x3

functionality targeted by future simulation approaches

aim for efficient utilization of current and future hardware

➡ demonstrator started ~04/2013

Page 7: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Recap of performance statusprovided SIMD optimized vector interfaces and algorithms for some elementary solids and geometry base functions ( implemented important functions for particle navigation )

can run chain of algorithms in vector/SIMD mode

5

distFromInsidemothervolume

pick next daughter volume

transform coordinates to daughter frame

distToOutsidedaughtervol

update step + boundary

vector flow

SIMD

SIMD

SIMD

SIMD

CHEP13 paper: http://arxiv.org/pdf/1312.0816.pdf

Page 8: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Recap of performance statusprovided SIMD optimized vector interfaces and algorithms for some elementary solids and geometry base functions ( implemented important functions for particle navigation )

can run chain of algorithms in vector/SIMD mode

5

distFromInsidemothervolume

pick next daughter volume

transform coordinates to daughter frame

distToOutsidedaughtervol

update step + boundary

vector flow

SIMD

SIMD

SIMD

SIMD

CHEP13 paper: http://arxiv.org/pdf/1312.0816.pdf

16 particles 1024 particles SIMD MAX

Intel IvyBridge

(AVX)~2.8x ~4.0x 4x

Intel Haswell (AVX2) ~3.0x ~5.0x 4x

Intel Xeon-Phi

(AVX512)~4.1x ~4.8x 8x

Xeon-Phi and Haswell benchmarks by CERN Openlab (Georgios Bitzes) gcc 4.8; -O3 -funroll-loops -mavx; no FMA

good overall performance gains for such an algorithm (in toy detector with 4 boxes, 3 tubes, 2 cones) - compared to ROOT/5.34.17

Page 9: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient 1: SIMD Vectorization

How to (particle) vectorize existing code (with many branches...) ?

6

Option A (“free lunch”):put code into a loop and let the compiler do the work

works in very few cases

Page 10: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient 1: SIMD Vectorization

How to (particle) vectorize existing code (with many branches...) ?

6

Option A (“free lunch”):put code into a loop and let the compiler do the work

works in very few cases

Option B (“convince the compiler”):refactor the code to make it “auto-vectorizer” friendly

might work but strongly compiler dependent

Page 11: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient 1: SIMD Vectorization

How to (particle) vectorize existing code (with many branches...) ?

6

Option A (“free lunch”):put code into a loop and let the compiler do the work

works in very few cases

Option B (“convince the compiler”):refactor the code to make it “auto-vectorizer” friendly

might work but strongly compiler dependent

http://code.compeng.uni-frankfurt.de/projects/vc

Option C (“use SIMD library”):

refactor the code and perform explicit vectorization using a vectorization library

always SIMD vectorizes, compiler independent excellent experience with the Vc library other libraries exist: VectorType (Agner Fog), Boost::SIMD, ...

// hello world example with Vc-SIMD typesVc::Vector<double> a, b, c;c=a+b;

Page 12: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient II: C++ template techniques

a lot of branches in geometry code just distinguish between “static” properties of class instances

general “tube solid” class distinguishes at runtime between “FullTube”, “Hollow Tube” ...

7

FullTubePhiFullTube HollowTube

“branches are the enemy of vectorization...”

Page 13: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient II: C++ template techniques

a lot of branches in geometry code just distinguish between “static” properties of class instances

general “tube solid” class distinguishes at runtime between “FullTube”, “Hollow Tube” ...

7

FullTubePhiFullTube HollowTube

“branches are the enemy of vectorization...”

AbstractTube

SafetyDistanceToIn

SpecializedTube TubeType

see talk (29-1-14) in this forum for further details

we employ template techniques to:

evaluate and reduce “static” branches at compile time

to generate binary code specialized to concrete solid instances

➡ makes vectorization more efficient

➡ allows better compiler optimizations in scalar code

Page 14: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient III: Rethink class layout (somewhat)

8

newcurrent geometry packages are

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

Navigator “logical volume/solid centric”

distance functions declared in logical solids

thanks to discussions with Laurent Duhem ( Intel )

simplified layouts!

Page 15: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient III: Rethink class layout (somewhat)

8

newcurrent geometry packages are

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

Navigator

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

Navigator

1 2

transform points call distance function (virtual) “logical volume/solid centric”

distance functions declared in logical solidsclient code (navigator) requires two-step process to calculate distance to a detector element (PhysicalBox) requires temporaries

thanks to discussions with Laurent Duhem ( Intel )

simplified layouts!

Page 16: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient III: Rethink class layout (somewhat)

8

newcurrent geometry packages are

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

Navigator

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

Navigator

1 2

transform points call distance function (virtual) “logical volume/solid centric”

distance functions declared in logical solidsclient code (navigator) requires two-step process to calculate distance to a detector element (PhysicalBox) requires temporaries

Navigator

Transformation3D

call distance function (virtual)

DistanceToIn(...)PlacedBox

UnplacedBox

1 “placed volume/solid centric”

change of resposibility of distance function

direct call possible; better encapsulation

no temporaries (unless wanted/needed)

I propose to be more

thanks to discussions with Laurent Duhem ( Intel )

simplified layouts!

Page 17: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Ingredient III: Rethink class layout (somewhat)

8

new

➡ accounts for 15% of performance gains in SIMD benchmark

current geometry packages are

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

Navigator

PhysicalBox

Transformation3DDistanceToIn(...)LogicalBox

Navigator

1 2

transform points call distance function (virtual) “logical volume/solid centric”

distance functions declared in logical solidsclient code (navigator) requires two-step process to calculate distance to a detector element (PhysicalBox) requires temporaries

Navigator

Transformation3D

call distance function (virtual)

DistanceToIn(...)PlacedBox

UnplacedBox

1 “placed volume/solid centric”

change of resposibility of distance function

direct call possible; better encapsulation

no temporaries (unless wanted/needed)

I propose to be more

thanks to discussions with Laurent Duhem ( Intel )

simplified layouts!

Page 18: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Beyond the demonstrator: Towards a general high performance library for

detector geometry

9

Georgios Bitzes ( CERN Openlab ) Johannes De Fine Licht ( CERN technical student ) Guilherme Lima ( Fermilab )Raman Sehgal ( BARC, India )

“vectorization everywhere”“architecture abstraction”

“reusable components”

with contributions from

Page 19: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Performance

optimized many particle treatment

template techniques

template class specialization / code generation

SIMD

algo + class review

Vc library

Goals

Approach

Implementation

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

The demonstrator

The main points so far ....

Page 20: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Performance

optimized many particle treatment

template techniques

template class specialization / code generation

SIMD

algo + class review

Vc library

Goals

Approach

Implementation

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Vectorization is not limited to many-particle case

Potential for SIMD optimization even in existing scalar algorithms and base operations

Beyond the demonstrator (I)

optimized 1-particle functions

optimized base types / containers

Page 21: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

3D linear algebra is foundation layer of many simulation/geometry tasks

particle transport ( vector + vector ), coordinate transformations ( Rotation3D x vector ), aggregating transformation ( Rotation3D x Rotation3D )

current library implementations do not support internal vectorization of such operations for single particles ( apart from BlazeLib )

Example: Review of 1-particle base classes

12

Page 22: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Now provided specialized classes that achieve this (using Vc SIMD abstraction)can choose optimal memory layout for our use caseknow how matrix going to be used ( no need for efficient inverse for instance )some memory padding ...

3D linear algebra is foundation layer of many simulation/geometry tasks

particle transport ( vector + vector ), coordinate transformations ( Rotation3D x vector ), aggregating transformation ( Rotation3D x Rotation3D )

current library implementations do not support internal vectorization of such operations for single particles ( apart from BlazeLib )

Example: Review of 1-particle base classes

12

Page 23: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Now provided specialized classes that achieve this (using Vc SIMD abstraction)can choose optimal memory layout for our use caseknow how matrix going to be used ( no need for efficient inverse for instance )some memory padding ...

3D linear algebra is foundation layer of many simulation/geometry tasks

particle transport ( vector + vector ), coordinate transformations ( Rotation3D x vector ), aggregating transformation ( Rotation3D x Rotation3D )

current library implementations do not support internal vectorization of such operations for single particles ( apart from BlazeLib )

Example: Review of 1-particle base classes

12

with Georgios Bitzes (CERN Openlab), Raman Sehgal ( BARC, India )

0

7

14

21

28

35

Rotation3D x Vector Rotation3D x Rotation3D

CP

U c

ycl

es

(In

tel

iCo

re7

)

G4BlazeLibRoot/S-MatrixEigennew

G4

G4

Bla

zeL

ib

Eig

en

3

ne

w

Ro

ot

Eig

en

3

new

Bla

zeL

ib

Ro

ot

1.5

1.4

gcc 4.8; -O3 -funroll-loops -mavx; no FMA

preliminary

Page 24: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Performance

optimized many particle treatment

template techniques

template class specialization / code generation

SIMD

algo + class review

Vc library

Goals

Approach

Implementation

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Beyond the demonstrator (II)

further wishes:

avoid code duplication for scalar, vector, GPU

try to target GPU

do not depend on a concrete SIMD vectorization technology/library

optimized 1-particle functions

optimized base types / containers

Page 25: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Performance

optimized many particle treatment

template techniques

template class specialization / code generation

SIMD

algo + class review

Vc library

Goals

Approach

Implementation

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Where we'd like to go

optimized 1-particle functions

Abstraction Code reuse

Cilk Plus autovectorization ....?

optimized base types / containers

SIMD abstraction

CPU/GPU abstraction

reusable components

same code base for CPU/GPU where appropriate

generic programming

Page 26: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Summary

15

Part II: we are ready to go beyond the demonstrator and tackle a generic high performance library for detector geometry

Part I: promising SIMD results in geometry demonstrator

Page 27: Sandro Wenzel / CERN-PH-SFT · logo may be reproduced in another colour in instances of single-colour print. Clear space A clear space must be respected around the logo: other graphical

Sandro Wenzel, CERN-PH-SFT Annual Concurrency Forum Meeting, 02-04-14

Summary

15

Extensions:

Extension I: team up with AIDA USolids effort

Extension II: use similar concepts in other simulation areas ( physics )

Extensions III: work together in revision of fundamental base classes ( Vector3D, Rotation3D ); contribute them to core math libraries ( ROOT ) for common use

Part II: we are ready to go beyond the demonstrator and tackle a generic high performance library for detector geometry

Part I: promising SIMD results in geometry demonstrator