26
Profiling study of Profiling study of Geant4-GATE Geant4-GATE Discussion of suggested Discussion of suggested VRTs VRTs Nicolas Karakatsanis, George Loudos, Arion Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Chatziioannou Geant4-GATE technical meeting Geant4-GATE technical meeting 12 12 th th September 2007, Hebden Bridge, UK September 2007, Hebden Bridge, UK

Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Embed Size (px)

DESCRIPTION

Introduction GATE performance degrades when voxelized maps (attenuation or emission distributions) are used GATE performance degrades when voxelized maps (attenuation or emission distributions) are used Simulation times become prohibitive for clinical or pre-clinical simulation studies Simulation times become prohibitive for clinical or pre-clinical simulation studies Profiling studies indicate a performance bottleneck on the Geant4 navigator function Profiling studies indicate a performance bottleneck on the Geant4 navigator function Collaboration between G4 and GATE developers to optimize the efficiency Collaboration between G4 and GATE developers to optimize the efficiency

Citation preview

Page 1: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Profiling study of Geant4-GATEProfiling study of Geant4-GATEDiscussion of suggested VRTsDiscussion of suggested VRTs

Nicolas Karakatsanis, George Loudos, Arion ChatziioannouNicolas Karakatsanis, George Loudos, Arion Chatziioannou

Geant4-GATE technical meetingGeant4-GATE technical meeting1212thth September 2007, Hebden Bridge, UK September 2007, Hebden Bridge, UK

Page 2: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

ContentsContents

IntroductionIntroduction Simulation efficiency with voxelized phantomsSimulation efficiency with voxelized phantoms Compressed Voxel ParameterizationCompressed Voxel Parameterization Axial emission angle effect on efficiencyAxial emission angle effect on efficiency Profiling study - ConclusionsProfiling study - Conclusions Discussion of possible new VRTs appliedDiscussion of possible new VRTs applied Suggested actionsSuggested actions

Page 3: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

IntroductionIntroduction GATE performance degrades when voxelized GATE performance degrades when voxelized

maps (attenuation or emission distributions) maps (attenuation or emission distributions) are usedare used Simulation times become prohibitive for clinical or Simulation times become prohibitive for clinical or

pre-clinical simulation studiespre-clinical simulation studies Profiling studies indicate a performance Profiling studies indicate a performance

bottleneck on the Geant4 navigator functionbottleneck on the Geant4 navigator function Collaboration between G4 and GATE developers Collaboration between G4 and GATE developers

to optimize the efficiencyto optimize the efficiency

Page 4: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Simulation efficiency with voxelized Simulation efficiency with voxelized phantomsphantoms

Geant4-GATE features (our analysis)Geant4-GATE features (our analysis) Deal with each voxel as a different material regionDeal with each voxel as a different material region Only one step is forced in each material region boundaryOnly one step is forced in each material region boundary ThereforeTherefore

G4 stepping implementation can be one of most important (if not G4 stepping implementation can be one of most important (if not the most important) factors that affects simulation speed the most important) factors that affects simulation speed

when using large voxelized phantoms when using large voxelized phantoms Possible solutionPossible solution

If the material in one voxel is the same as that in its If the material in one voxel is the same as that in its neighbouring voxel, neighbouring voxel,

Treat those two voxels as one region (no region boundary between Treat those two voxels as one region (no region boundary between those two voxels)those two voxels)

Check that condition for all voxelsCheck that condition for all voxels

Page 5: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Simulation efficiency with voxelized Simulation efficiency with voxelized phantomsphantoms

Possible solution (..continue)Possible solution (..continue) Create an option in stepping function Create an option in stepping function

to test if the material in next region is the same as to test if the material in next region is the same as current one. current one.

If yesIf yes no boundary is applied here and no boundary is applied here and a normal step is takena normal step is taken

A flag can turn on/off the previous optionA flag can turn on/off the previous option

Page 6: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Compressed Voxel ParameterizationCompressed Voxel Parameterization Solution developed in GATESolution developed in GATE

Application of a “compressed” voxel parameterizationApplication of a “compressed” voxel parameterization generate a phantom where voxel size is variable generate a phantom where voxel size is variable All adjacent voxels of the same material are fused together to form the largest All adjacent voxels of the same material are fused together to form the largest

possible rectangular voxel. possible rectangular voxel. Compressed parameterization uses Compressed parameterization uses

less memory and also less memory and also less CPU cyclesless CPU cycles

It is possible to exclude regions in the phantom from being compressed It is possible to exclude regions in the phantom from being compressed through the use of an "exclude list" of materialsthrough the use of an "exclude list" of materials

DiscussionDiscussion Initial purpose of implementationInitial purpose of implementation

less memory usage – larger phantoms can be simulatedless memory usage – larger phantoms can be simulated Side-effectsSide-effects

less CPU cycles usage - reduce the simulation time but only by maybe 30%less CPU cycles usage - reduce the simulation time but only by maybe 30% Need for a better speed-up factorNeed for a better speed-up factor

Investigation of a more efficient way to track particles in a voxelized geometryInvestigation of a more efficient way to track particles in a voxelized geometry

Contact: Richard Taschereau: Contact: Richard Taschereau: [email protected]@mednet.ucla.edu

Page 7: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Axial emission angle effect on efficiencyAxial emission angle effect on efficiency

Simulation Count Rate (Detected Coincidences per sec in simulation) vs Axial Emission Angle Limit (deg) for various positions of a point source along the axis of the HR+ GATE model within

the Zubal brain phantom

0

10000

20000

30000

40000

50000

60000

70000

80000

0 20 40 60 80 100 120 140 160 180 200Axial Emission Angle Limit (deg)

Sim

ulat

ion

Cou

nt R

ate

(Det

ecte

d C

oinc

iden

ces

per s

ec in

sim

ulat

ion)

Axial position = 0cm

Axial position = 2cm

Axial position = 4cm

Axial position = 6cm

Axial position = 8cm

A certain axial emission angle (for back-to-back emission source) A certain axial emission angle (for back-to-back emission source) optimizes the simulation count rate without significant inaccuracy in optimizes the simulation count rate without significant inaccuracy in the scatter fractionthe scatter fraction

Page 8: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Axial emission angle effect on efficiencyAxial emission angle effect on efficiencyA certain axial emission angle (for back-to-back emission source) A certain axial emission angle (for back-to-back emission source) optimizes the simulation count rate without significant inaccuracy in the optimizes the simulation count rate without significant inaccuracy in the scatter fractionscatter fraction

Scatter fraction (after random subtraction) vs axial emission angle limit for various positions of a point source inside the Zubal brain phantom along the axis of the

GATE model of HR+ scanner

0

20

40

60

80

100

120

0 20 40 60 80 100 120 140 160 180 200

Axial emission angle limit (deg)

Scat

ter f

ract

ion

Axial position = 0cm

Axial position = 2cm

Axial position = 4cm

Axial position = 6cm

Axial position = 8cm

Page 9: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Axial emission angle effect on efficiencyAxial emission angle effect on efficiencyA certain axial emission angle (for back-to-back emission source) A certain axial emission angle (for back-to-back emission source) optimizes the simulation count rate without significant inaccuracy in the optimizes the simulation count rate without significant inaccuracy in the scatter fractionscatter fraction

Comparative plot of the relationship of simulation count rate and scatter fraction as a function of axial emission angle limit for a realistic activity distribution (CBF and Brain Matter) inside the Zubal brain phantom using the GATE HR+ model scanner

0

5

10

15

20

25

30

35

180 100 60 40 20 10Axial Emission Angle Limit (deg)

Sim

ulat

ion

Cou

nt R

ate

(Det

ecte

d C

oinc

iden

ces

per s

ec in

sim

ulat

ion)

0

50000

100000

150000

200000

250000

Scat

ter f

ract

ion

Scatter fraction (afterrandom subtraction)

Simulation Count Rate(DetectedCoincidences per secin simulation)

Page 10: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Profiling studiesProfiling studies Profiling studies using GNU toolsProfiling studies using GNU tools

Gprof : Compare flat profiles of followingGprof : Compare flat profiles of following PET benchmarkPET benchmark

10sec acquisition time10sec acquisition time HR+ ECAT 962 model with Zubal high-resolution brain phantomHR+ ECAT 962 model with Zubal high-resolution brain phantom

10sec acquisition time10sec acquisition time Full attenuation map (128 slices of 256x256 pixels each)Full attenuation map (128 slices of 256x256 pixels each) Simple emission map (21 central slices)Simple emission map (21 central slices)

Callgrind: Compare flat profiles and call graphs of Callgrind: Compare flat profiles and call graphs of following cases*following cases*

NEMA cylindrical analytical phantomNEMA cylindrical analytical phantom NCAT thorax voxelized phantom (using default voxel NCAT thorax voxelized phantom (using default voxel

parameterization)parameterization) NCAT thorax voxelized phantom (using compressed voxel NCAT thorax voxelized phantom (using compressed voxel

parameterization)parameterization)*using in all cases the GATE model of Biograph6 scanner and acquiring *using in all cases the GATE model of Biograph6 scanner and acquiring

for 1 sec with the same activityfor 1 sec with the same activity

Page 11: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Gprof study – analytical phantomGprof study – analytical phantomFlat profile for the simulation of PET benchmark

HepJamesRandom::flat(), 38.56

Hep3Vector::Hep3Vector[in-charge](double, double,

double), 14.4

0

5

10

15

20

25

30

35

40

45

GATE functions

Called Functions

Perc

enta

ge o

f tot

al e

xecu

tion

time

(%)

HepJamesRandom::flat()

Hep3Vector::Hep3Vector[in-charge](double,double, double)

Hep3Vector::~Hep3Vector [in-charge]()

G4String::~G4String [in-charge]()

std::_Rb_tree_base_iterator::_M_increment()

G4String::G4String [in-charge](G4String const&)

HepRotation::rotateAxes(Hep3Vector const&,Hep3Vector const&,Hep3Vector const&)

std::queue<G4String, std::deque<G4String,std::allocator<G4String> > >::~queue [in-charge] ()

operator new(unsigned, void*)

HepRandom::getTheEngine()

HepJamesRandom::setSeeds(long const*, int)

Page 12: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Gprof study – voxelized phantomGprof study – voxelized phantomFlat profile for the simulation of HR+ with zubal phantom

Hep3Vector::Hep3Vector[in-charge](double, double,

double)

HepJamesRandom::flat(), 3.8

0

10

20

30

40

50

60

70

1

Called Functions

Perc

enta

ge o

f tot

al e

xecu

tion

time

Hep3Vector::Hep3Vector[in-charge](double, double,double)

HepRotation::rotateAxes(Hep3Vector const&,Hep3Vector const&,Hep3Vector const&)

Hep3Vector::~Hep3Vector [in-charge]()

G4String::~G4String [in-charge]()

G4String::G4String [in-charge](G4String const&)

Hep3Vector::operator()(int) const

HepJamesRandom::flat()

operator new(unsigned, void*)

GateOutputMgr::GetInstance()

G4String::G4String [in-charge](char const&)

std::queue<G4String, std::deque<G4String,std::allocator<G4String> > >::~queue [in-charge] ()

Page 13: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Callgrind studyCallgrind study analytical NEMA phantom analytical NEMA phantomFlat profile for an analytical cylindrical NEMA phantom

log1

0

G4E

MD

ataS

et::F

indB

inLo

catio

n

CLH

EP::H

epJa

mes

Ran

dom

::fla

t

G4L

ogLo

gInt

erpo

latio

n::C

alcu

late

pow

_iee

e754

_pow

int_

mal

loc

G4E

MD

ataS

et::F

indV

alue

isna

n

_iee

e754

_log

10

0

1

2

3

4

5

6

7

function

GATE - Geant4 functions

Func

tion

self

cost

(%)

log10

isnan

_ieee754_log10

G4EMDataSet::FindBinLocation

CLHEP::HepJamesRandom::flat

G4LogLogInterpolation::Calculate

pow

_ieee754_pow

int_malloc

G4EMDataSet::FindValue

Page 14: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Callgrind study voxelized NCAT phantom Callgrind study voxelized NCAT phantom (default parameterization)(default parameterization)

Flat profile for a NCAT phantom using default voxel parameterization

log1

0

G4P

aram

eter

ised

Nav

igat

ion:

:Lev

elLo

cate

G4E

MD

ataS

et::

Find

Bin

Loca

tion

G4E

MD

ataS

et::

Find

Valu

e

div

CLH

EP::

Hep

Rot

atio

n::r

otat

eAxe

s

CLH

EP::

Hep

Jam

esR

ando

m::

flat

G4N

avig

atio

nLev

elR

ep::

G4N

avig

atio

nLev

elR

ep

G4L

ogLo

gInt

erpo

latio

n::C

alcu

late

Gat

eVox

elB

oxPa

ram

eter

izat

ion:

: Com

pute

Tran

sfor

mat

ion

G4P

aram

eter

ised

Nav

igat

ion:

: Ide

ntify

And

Plac

eSol

id

0

1

2

3

4

5

6

7

8

9

10

1

GATE - Geant4 functions

Func

tion

self

cost

(%)

log10

G4NavigationLevelRep::G4NavigationLevelRep

G4ParameterisedNavigation::LevelLocate

isnan

_ieee754_log10

G4EMDataSet::FindBinLocation

G4LogLogInterpolation::Calculate

pow

_ieee754_pow

G4EMDataSet::FindValue

GateVoxelBoxParameterization::ComputeTransformationdiv

G4Box::Inside

G4NavigationLevel::~G4NavigationLevel

G4NavigationLevel::operator=

G4NavigationLevel::G4NavigationLevel

G4ParameterisedNavigation::IdentifyAndPlaceSolid

CLHEP::HepRotation::rotateAxes

CLHEP::HepJamesRandom::flat

Page 15: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Callgrind study voxelized NCAT phantom Callgrind study voxelized NCAT phantom (compressed parameterization)(compressed parameterization)

Flat profile for for a NCAT phantom using compressed voxel parameterization

log1

0G

4Nav

igat

ionL

evel

Rep

:: G

4Nav

igat

ionL

evel

Rep

isna

nG

4Par

amet

eris

edN

avig

atio

n::L

evel

Loca

te

G4E

MD

ataS

et::

Find

Bin

Loca

tion

G4L

ogLo

gInt

erpo

latio

n::C

alcu

late

pow

G4E

MD

ataS

et::

Find

Valu

eG

ateC

ompr

esse

dVox

elPa

ram

eter

izat

ion:

: C

ompu

teTr

ansf

orm

atio

n

Gat

eCom

pres

sedV

oxel

Para

met

eriz

atio

n::

Com

pute

Dim

ensi

ons

G4P

aram

eter

ised

Nav

igat

ion:

: Ide

ntify

And

Plac

eSol

id

G4B

ox::

SetX

Hal

fLen

gth

G4B

ox::

SetY

Hal

fLen

gth

G4B

ox::

SetZ

Hal

fLen

gth

CLH

EP::

Hep

Jam

esR

ando

m::

flat

CLH

EP::

Hep

Rot

atio

n::r

otat

eAxe

s

0

1

2

3

4

5

6

7

8

9

1

GATE - Geant4 functions

Func

tion

self

cost

(%)

log10

G4NavigationLevelRep:: G4NavigationLevelRep

isnan

G4ParameterisedNavigation::LevelLocate

_ieee754_log10

G4EMDataSet::FindBinLocation

G4LogLogInterpolation::Calculate

pow

_ieee754_pow

G4EMDataSet::FindValue

GateCompressedVoxelParameterization::ComputeTransformationCLHEP::operator/

GateCompressedVoxelParameterization::ComputeDimensionsG4NavigationLevel::~G4NavigationLevel

G4NavigationLevel::operator=

int_malloc

G4NavigationLevel::G4NavigationLevel

G4ParameterisedNavigation::IdentifyAndPlaceSolidG4Box::Inside

G4Box::SetXHalfLength

G4Box::SetYHalfLength

G4Box::SetZHalfLength

CLHEP::HepJamesRandom::flat

CLHEP::HepRotation::rotateAxes

Page 16: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Total time performance cost for three cases

NC

AT_

para

met

eriz

ed

NCAT

_com

pres

sed

NEMA0

2E+10

4E+10

6E+10

8E+10

1E+11

1E+11

1

simulation cases

Tota

l tim

e co

st (a

bsol

ute

valu

es) NEMA

NCAT_parameterized

NCAT_compressed

Callgrind study Callgrind study Comparison of time-performance costComparison of time-performance cost

Total time cost refers to the total time required for the execution of the simulationAbsolute cost values are presented with arbitrary unitsThe total cost increases by 814% and 517% when a voxelized phantom and a compressed phantom is used respectively

Page 17: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Callgrind study Callgrind study Comparison of time-performance costComparison of time-performance cost

G4LogLogInterpolation::Calculate time performance cost (self) for three cases

NC

AT_

para

met

eriz

ed

NC

AT_

com

pres

sed

NEMA0

500000000

1000000000

1500000000

2000000000

2500000000

3000000000

3500000000

4000000000

4500000000

1

simulation cases

Tim

e co

st (a

bsol

ute

valu

es)

NEMA

NCAT_parameterized

NCAT_compressed

G4EMDataSet::FindValue time performance cost (inclusive) for three cases

NC

AT_

para

met

eriz

ed

NCAT

_com

pres

sed

NEMA0

5000000000

10000000000

15000000000

20000000000

25000000000

30000000000

35000000000

40000000000

45000000000

1

simulation cases

Tim

e co

st (a

bsol

ute

valu

es)

NEMA

NCAT_parameterized

NCAT_compressed

G4EMDataSet::FindValue exhibitsone of the highest time inclusive costs (absolute values) among the G4functions which are called by the G4SteppingManager

Inclusive time cost is the time spent on the function itself and all of its callees

G4LogLogInterpolation::Calculate exhibits one of the highest time self costs (absolute values) among the G4functions which are called by the G4EMDataSet::FindValue

Self time cost is the time spent only on the function itself

Page 18: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Callgrind study Callgrind study Comparison between default and compressed parameterizationComparison between default and compressed parameterization

G4ParameterisedNavigation::LevelLocate time performance cost (inclusive) for three cases

NC

AT_

para

met

eriz

ed

NC

AT_

com

pres

sed

0

5000000000

10000000000

15000000000

20000000000

25000000000

30000000000

1

simulation cases

Tim

e co

st (a

bsol

ute

valu

es) NCAT_parameterized

NCAT_compressed

G4ParameterisedNavigation:: IdentifyAndPlaceSolid time performance cost (inclusive) for three cases

NC

AT_

para

met

eriz

ed

NCAT

_com

pres

sed

0

1000000000

2000000000

3000000000

4000000000

5000000000

6000000000

7000000000

8000000000

9000000000

1

simulation cases

Tim

e co

st (a

bsol

ute

valu

es)

NCAT_parameterized

NCAT_compressed

G4ParameterisedNavigation:: LevelLocate exhibitsone of the highest time inclusive costs (absolute values) among the functions of the G4Navigator class

This function is located at libG4navigator.so and calls most of the functions used by the navigator

G4ParameterisedNavigation:: IdentifyAndPlaceSolid determines the parameterization type applied for each voxel

Compressed voxel implementation adds up the inclusive time cost of this function, because it adjusts the voxel dimensions depending on its material attribute

Page 19: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Callgrind study - Call graphCallgrind study - Call graphNEMA analytical phantomNEMA analytical phantom

G4EMDataSet::FindValue function spends 30.86% of the total cost by completing it’s calls to callees functions and by returning calls made to this function from other caller functions.

Page 20: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Callgrind study - Call graphCallgrind study - Call graphNCAT voxelized phantom (default parameterization)NCAT voxelized phantom (default parameterization)

G4ParameterisedNavigation::LevelLocate function spends 27.88% of the total cost by by completing it’s calls to callees functions and by returning calls made to this function from other caller functions.

Page 21: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Callgrind study - Call graphCallgrind study - Call graphNCAT voxelized phantom (compressed parameterization)NCAT voxelized phantom (compressed parameterization)

G4ParameterisedNavigation::LevelLocate function spends 31.10% of the total cost by by completing it’s calls to callees functions and by returning calls made to this function from other caller functions.

Page 22: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Profiling study conclusionsProfiling study conclusions In the case of voxelized phantomsIn the case of voxelized phantoms

Significant degradation of the G4-GATE performanceSignificant degradation of the G4-GATE performance 8 times slower with default parameterization8 times slower with default parameterization 5 times slower with compressed parameterization5 times slower with compressed parameterization

G4NavigationLevelRep::G4NavigationLevelRep exhibits the highest self time cost G4NavigationLevelRep::G4NavigationLevelRep exhibits the highest self time cost (max. 7.44%)(max. 7.44%)

G4ParameterisedNavigation::LevelLocate exhibits the highest inclusive time cost G4ParameterisedNavigation::LevelLocate exhibits the highest inclusive time cost among functions called by the G4steppingManageramong functions called by the G4steppingManager

Compressed voxel parameterization achieves speed-up of 30-40%Compressed voxel parameterization achieves speed-up of 30-40% Need for more efficient version of stepping manager and navigator optimized for G4 Need for more efficient version of stepping manager and navigator optimized for G4

medical applicationsmedical applications

In the case of analytical phantomsIn the case of analytical phantoms G4EMDataSet::FindValue (located at libG4emlowenergy.so) appears to cause the most G4EMDataSet::FindValue (located at libG4emlowenergy.so) appears to cause the most

important time costimportant time cost Primer caller of the G4 function with the highest combination of self and inclusive time cost Primer caller of the G4 function with the highest combination of self and inclusive time cost

(G4LogLogInterpolation::Calculate) in the case of analytical phantoms(G4LogLogInterpolation::Calculate) in the case of analytical phantoms self: 3.24%, inclusive: 29.4% (for analytical phantoms)self: 3.24%, inclusive: 29.4% (for analytical phantoms) self: 4.37%, inclusive: 33.18% (for analytical phantoms)self: 4.37%, inclusive: 33.18% (for analytical phantoms)

Page 23: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Current VRT Implementations (?)Current VRT Implementations (?) Variance Reduction Techniques (VRTs)Variance Reduction Techniques (VRTs)

Importance samplingImportance sampling Photon splittingPhoton splitting Russian rouletteRussian roulette

Weight window samplingWeight window sampling Weight rouletteWeight roulette ScoringScoring

A number of VRT implementations are already A number of VRT implementations are already available in Geant4, butavailable in Geant4, but Some of the Geant4 classes can be optimized more, Some of the Geant4 classes can be optimized more, Further classes implementing VRTs could be developed Further classes implementing VRTs could be developed

Page 24: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Discussion of possible new VRTs appliedDiscussion of possible new VRTs applied Alternative photon splitting technique for G4Alternative photon splitting technique for G4

If the first photon of the annihilation pair is detected If the first photon of the annihilation pair is detected => second photon splits into multiple photons with equal weights and total => second photon splits into multiple photons with equal weights and total

weight sum of 1weight sum of 1 Degree of splitting depends on the probability of detectionDegree of splitting depends on the probability of detection

Probability is dependent on axial position and emission angleProbability is dependent on axial position and emission angle Therefore geometrical importance sampling is based on axial position and Therefore geometrical importance sampling is based on axial position and

emission angleemission angle Therefore prior knowledge of the geometry of the detector systems is Therefore prior knowledge of the geometry of the detector systems is

requiredrequired High detection probability => photon splits into a small number of High detection probability => photon splits into a small number of

secondary photonssecondary photons Low detection probability => photon splits into a large number of secondary Low detection probability => photon splits into a large number of secondary

photonsphotons Estimation of speed-up factorEstimation of speed-up factor

3 – 4 times increase in efficiency of the application3 – 4 times increase in efficiency of the application Additional photon transport algorithms could be implementedAdditional photon transport algorithms could be implemented

Delta scattering including energy discriminationDelta scattering including energy discrimination Flags implementation Flags implementation

user can easily activate or not the various VRT options depending on the user can easily activate or not the various VRT options depending on the requirements of its applicationrequirements of its application

Page 25: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

Suggested actionsSuggested actions Collaboration between GATE and Geant4 VRTs Collaboration between GATE and Geant4 VRTs

workgroupworkgroup Proposed targets of collaborationProposed targets of collaboration

Determination of the existing G4 classes need to be Determination of the existing G4 classes need to be optimizedoptimized

Determination of further G4 classes possibly needed to be Determination of further G4 classes possibly needed to be implementedimplemented

Implementation of GATE-specific classes within GATEImplementation of GATE-specific classes within GATE Validation of the new classes themselves and within Validation of the new classes themselves and within

GATE-Geant4 frameGATE-Geant4 frame Publication of a GATE-specific patch for Geant4Publication of a GATE-specific patch for Geant4

Page 26: Profiling study of Geant4-GATE Discussion of suggested VRTs Nicolas Karakatsanis, George Loudos, Arion Chatziioannou Geant4-GATE technical meeting 12 th

AcknowledgementsAcknowledgements

This work is a part of the project: fastGATEThis work is a part of the project: fastGATE funded by the Greek Secretary of Research and funded by the Greek Secretary of Research and

TechnologyTechnology PartnersPartners

National Technical University of AthensNational Technical University of Athens University of California, Los AngelesUniversity of California, Los Angeles