Upload
alice-parker
View
219
Download
0
Embed Size (px)
DESCRIPTION
Introduction GATE performance degrades when voxelized maps (attenuation or emission distributions) are used GATE performance degrades when voxelized maps (attenuation or emission distributions) are used Simulation times become prohibitive for clinical or pre-clinical simulation studies Simulation times become prohibitive for clinical or pre-clinical simulation studies Profiling studies indicate a performance bottleneck on the Geant4 navigator function Profiling studies indicate a performance bottleneck on the Geant4 navigator function Collaboration between G4 and GATE developers to optimize the efficiency Collaboration between G4 and GATE developers to optimize the efficiency
Citation preview
Profiling study of Geant4-GATEProfiling study of Geant4-GATEDiscussion of suggested VRTsDiscussion of suggested VRTs
Nicolas Karakatsanis, George Loudos, Arion ChatziioannouNicolas Karakatsanis, George Loudos, Arion Chatziioannou
Geant4-GATE technical meetingGeant4-GATE technical meeting1212thth September 2007, Hebden Bridge, UK September 2007, Hebden Bridge, UK
ContentsContents
IntroductionIntroduction Simulation efficiency with voxelized phantomsSimulation efficiency with voxelized phantoms Compressed Voxel ParameterizationCompressed Voxel Parameterization Axial emission angle effect on efficiencyAxial emission angle effect on efficiency Profiling study - ConclusionsProfiling study - Conclusions Discussion of possible new VRTs appliedDiscussion of possible new VRTs applied Suggested actionsSuggested actions
IntroductionIntroduction GATE performance degrades when voxelized GATE performance degrades when voxelized
maps (attenuation or emission distributions) maps (attenuation or emission distributions) are usedare used Simulation times become prohibitive for clinical or Simulation times become prohibitive for clinical or
pre-clinical simulation studiespre-clinical simulation studies Profiling studies indicate a performance Profiling studies indicate a performance
bottleneck on the Geant4 navigator functionbottleneck on the Geant4 navigator function Collaboration between G4 and GATE developers Collaboration between G4 and GATE developers
to optimize the efficiencyto optimize the efficiency
Simulation efficiency with voxelized Simulation efficiency with voxelized phantomsphantoms
Geant4-GATE features (our analysis)Geant4-GATE features (our analysis) Deal with each voxel as a different material regionDeal with each voxel as a different material region Only one step is forced in each material region boundaryOnly one step is forced in each material region boundary ThereforeTherefore
G4 stepping implementation can be one of most important (if not G4 stepping implementation can be one of most important (if not the most important) factors that affects simulation speed the most important) factors that affects simulation speed
when using large voxelized phantoms when using large voxelized phantoms Possible solutionPossible solution
If the material in one voxel is the same as that in its If the material in one voxel is the same as that in its neighbouring voxel, neighbouring voxel,
Treat those two voxels as one region (no region boundary between Treat those two voxels as one region (no region boundary between those two voxels)those two voxels)
Check that condition for all voxelsCheck that condition for all voxels
Simulation efficiency with voxelized Simulation efficiency with voxelized phantomsphantoms
Possible solution (..continue)Possible solution (..continue) Create an option in stepping function Create an option in stepping function
to test if the material in next region is the same as to test if the material in next region is the same as current one. current one.
If yesIf yes no boundary is applied here and no boundary is applied here and a normal step is takena normal step is taken
A flag can turn on/off the previous optionA flag can turn on/off the previous option
Compressed Voxel ParameterizationCompressed Voxel Parameterization Solution developed in GATESolution developed in GATE
Application of a “compressed” voxel parameterizationApplication of a “compressed” voxel parameterization generate a phantom where voxel size is variable generate a phantom where voxel size is variable All adjacent voxels of the same material are fused together to form the largest All adjacent voxels of the same material are fused together to form the largest
possible rectangular voxel. possible rectangular voxel. Compressed parameterization uses Compressed parameterization uses
less memory and also less memory and also less CPU cyclesless CPU cycles
It is possible to exclude regions in the phantom from being compressed It is possible to exclude regions in the phantom from being compressed through the use of an "exclude list" of materialsthrough the use of an "exclude list" of materials
DiscussionDiscussion Initial purpose of implementationInitial purpose of implementation
less memory usage – larger phantoms can be simulatedless memory usage – larger phantoms can be simulated Side-effectsSide-effects
less CPU cycles usage - reduce the simulation time but only by maybe 30%less CPU cycles usage - reduce the simulation time but only by maybe 30% Need for a better speed-up factorNeed for a better speed-up factor
Investigation of a more efficient way to track particles in a voxelized geometryInvestigation of a more efficient way to track particles in a voxelized geometry
Contact: Richard Taschereau: Contact: Richard Taschereau: [email protected]@mednet.ucla.edu
Axial emission angle effect on efficiencyAxial emission angle effect on efficiency
Simulation Count Rate (Detected Coincidences per sec in simulation) vs Axial Emission Angle Limit (deg) for various positions of a point source along the axis of the HR+ GATE model within
the Zubal brain phantom
0
10000
20000
30000
40000
50000
60000
70000
80000
0 20 40 60 80 100 120 140 160 180 200Axial Emission Angle Limit (deg)
Sim
ulat
ion
Cou
nt R
ate
(Det
ecte
d C
oinc
iden
ces
per s
ec in
sim
ulat
ion)
Axial position = 0cm
Axial position = 2cm
Axial position = 4cm
Axial position = 6cm
Axial position = 8cm
A certain axial emission angle (for back-to-back emission source) A certain axial emission angle (for back-to-back emission source) optimizes the simulation count rate without significant inaccuracy in optimizes the simulation count rate without significant inaccuracy in the scatter fractionthe scatter fraction
Axial emission angle effect on efficiencyAxial emission angle effect on efficiencyA certain axial emission angle (for back-to-back emission source) A certain axial emission angle (for back-to-back emission source) optimizes the simulation count rate without significant inaccuracy in the optimizes the simulation count rate without significant inaccuracy in the scatter fractionscatter fraction
Scatter fraction (after random subtraction) vs axial emission angle limit for various positions of a point source inside the Zubal brain phantom along the axis of the
GATE model of HR+ scanner
0
20
40
60
80
100
120
0 20 40 60 80 100 120 140 160 180 200
Axial emission angle limit (deg)
Scat
ter f
ract
ion
Axial position = 0cm
Axial position = 2cm
Axial position = 4cm
Axial position = 6cm
Axial position = 8cm
Axial emission angle effect on efficiencyAxial emission angle effect on efficiencyA certain axial emission angle (for back-to-back emission source) A certain axial emission angle (for back-to-back emission source) optimizes the simulation count rate without significant inaccuracy in the optimizes the simulation count rate without significant inaccuracy in the scatter fractionscatter fraction
Comparative plot of the relationship of simulation count rate and scatter fraction as a function of axial emission angle limit for a realistic activity distribution (CBF and Brain Matter) inside the Zubal brain phantom using the GATE HR+ model scanner
0
5
10
15
20
25
30
35
180 100 60 40 20 10Axial Emission Angle Limit (deg)
Sim
ulat
ion
Cou
nt R
ate
(Det
ecte
d C
oinc
iden
ces
per s
ec in
sim
ulat
ion)
0
50000
100000
150000
200000
250000
Scat
ter f
ract
ion
Scatter fraction (afterrandom subtraction)
Simulation Count Rate(DetectedCoincidences per secin simulation)
Profiling studiesProfiling studies Profiling studies using GNU toolsProfiling studies using GNU tools
Gprof : Compare flat profiles of followingGprof : Compare flat profiles of following PET benchmarkPET benchmark
10sec acquisition time10sec acquisition time HR+ ECAT 962 model with Zubal high-resolution brain phantomHR+ ECAT 962 model with Zubal high-resolution brain phantom
10sec acquisition time10sec acquisition time Full attenuation map (128 slices of 256x256 pixels each)Full attenuation map (128 slices of 256x256 pixels each) Simple emission map (21 central slices)Simple emission map (21 central slices)
Callgrind: Compare flat profiles and call graphs of Callgrind: Compare flat profiles and call graphs of following cases*following cases*
NEMA cylindrical analytical phantomNEMA cylindrical analytical phantom NCAT thorax voxelized phantom (using default voxel NCAT thorax voxelized phantom (using default voxel
parameterization)parameterization) NCAT thorax voxelized phantom (using compressed voxel NCAT thorax voxelized phantom (using compressed voxel
parameterization)parameterization)*using in all cases the GATE model of Biograph6 scanner and acquiring *using in all cases the GATE model of Biograph6 scanner and acquiring
for 1 sec with the same activityfor 1 sec with the same activity
Gprof study – analytical phantomGprof study – analytical phantomFlat profile for the simulation of PET benchmark
HepJamesRandom::flat(), 38.56
Hep3Vector::Hep3Vector[in-charge](double, double,
double), 14.4
0
5
10
15
20
25
30
35
40
45
GATE functions
Called Functions
Perc
enta
ge o
f tot
al e
xecu
tion
time
(%)
HepJamesRandom::flat()
Hep3Vector::Hep3Vector[in-charge](double,double, double)
Hep3Vector::~Hep3Vector [in-charge]()
G4String::~G4String [in-charge]()
std::_Rb_tree_base_iterator::_M_increment()
G4String::G4String [in-charge](G4String const&)
HepRotation::rotateAxes(Hep3Vector const&,Hep3Vector const&,Hep3Vector const&)
std::queue<G4String, std::deque<G4String,std::allocator<G4String> > >::~queue [in-charge] ()
operator new(unsigned, void*)
HepRandom::getTheEngine()
HepJamesRandom::setSeeds(long const*, int)
Gprof study – voxelized phantomGprof study – voxelized phantomFlat profile for the simulation of HR+ with zubal phantom
Hep3Vector::Hep3Vector[in-charge](double, double,
double)
HepJamesRandom::flat(), 3.8
0
10
20
30
40
50
60
70
1
Called Functions
Perc
enta
ge o
f tot
al e
xecu
tion
time
Hep3Vector::Hep3Vector[in-charge](double, double,double)
HepRotation::rotateAxes(Hep3Vector const&,Hep3Vector const&,Hep3Vector const&)
Hep3Vector::~Hep3Vector [in-charge]()
G4String::~G4String [in-charge]()
G4String::G4String [in-charge](G4String const&)
Hep3Vector::operator()(int) const
HepJamesRandom::flat()
operator new(unsigned, void*)
GateOutputMgr::GetInstance()
G4String::G4String [in-charge](char const&)
std::queue<G4String, std::deque<G4String,std::allocator<G4String> > >::~queue [in-charge] ()
Callgrind studyCallgrind study analytical NEMA phantom analytical NEMA phantomFlat profile for an analytical cylindrical NEMA phantom
log1
0
G4E
MD
ataS
et::F
indB
inLo
catio
n
CLH
EP::H
epJa
mes
Ran
dom
::fla
t
G4L
ogLo
gInt
erpo
latio
n::C
alcu
late
pow
_iee
e754
_pow
int_
mal
loc
G4E
MD
ataS
et::F
indV
alue
isna
n
_iee
e754
_log
10
0
1
2
3
4
5
6
7
function
GATE - Geant4 functions
Func
tion
self
cost
(%)
log10
isnan
_ieee754_log10
G4EMDataSet::FindBinLocation
CLHEP::HepJamesRandom::flat
G4LogLogInterpolation::Calculate
pow
_ieee754_pow
int_malloc
G4EMDataSet::FindValue
Callgrind study voxelized NCAT phantom Callgrind study voxelized NCAT phantom (default parameterization)(default parameterization)
Flat profile for a NCAT phantom using default voxel parameterization
log1
0
G4P
aram
eter
ised
Nav
igat
ion:
:Lev
elLo
cate
G4E
MD
ataS
et::
Find
Bin
Loca
tion
G4E
MD
ataS
et::
Find
Valu
e
div
CLH
EP::
Hep
Rot
atio
n::r
otat
eAxe
s
CLH
EP::
Hep
Jam
esR
ando
m::
flat
G4N
avig
atio
nLev
elR
ep::
G4N
avig
atio
nLev
elR
ep
G4L
ogLo
gInt
erpo
latio
n::C
alcu
late
Gat
eVox
elB
oxPa
ram
eter
izat
ion:
: Com
pute
Tran
sfor
mat
ion
G4P
aram
eter
ised
Nav
igat
ion:
: Ide
ntify
And
Plac
eSol
id
0
1
2
3
4
5
6
7
8
9
10
1
GATE - Geant4 functions
Func
tion
self
cost
(%)
log10
G4NavigationLevelRep::G4NavigationLevelRep
G4ParameterisedNavigation::LevelLocate
isnan
_ieee754_log10
G4EMDataSet::FindBinLocation
G4LogLogInterpolation::Calculate
pow
_ieee754_pow
G4EMDataSet::FindValue
GateVoxelBoxParameterization::ComputeTransformationdiv
G4Box::Inside
G4NavigationLevel::~G4NavigationLevel
G4NavigationLevel::operator=
G4NavigationLevel::G4NavigationLevel
G4ParameterisedNavigation::IdentifyAndPlaceSolid
CLHEP::HepRotation::rotateAxes
CLHEP::HepJamesRandom::flat
Callgrind study voxelized NCAT phantom Callgrind study voxelized NCAT phantom (compressed parameterization)(compressed parameterization)
Flat profile for for a NCAT phantom using compressed voxel parameterization
log1
0G
4Nav
igat
ionL
evel
Rep
:: G
4Nav
igat
ionL
evel
Rep
isna
nG
4Par
amet
eris
edN
avig
atio
n::L
evel
Loca
te
G4E
MD
ataS
et::
Find
Bin
Loca
tion
G4L
ogLo
gInt
erpo
latio
n::C
alcu
late
pow
G4E
MD
ataS
et::
Find
Valu
eG
ateC
ompr
esse
dVox
elPa
ram
eter
izat
ion:
: C
ompu
teTr
ansf
orm
atio
n
Gat
eCom
pres
sedV
oxel
Para
met
eriz
atio
n::
Com
pute
Dim
ensi
ons
G4P
aram
eter
ised
Nav
igat
ion:
: Ide
ntify
And
Plac
eSol
id
G4B
ox::
SetX
Hal
fLen
gth
G4B
ox::
SetY
Hal
fLen
gth
G4B
ox::
SetZ
Hal
fLen
gth
CLH
EP::
Hep
Jam
esR
ando
m::
flat
CLH
EP::
Hep
Rot
atio
n::r
otat
eAxe
s
0
1
2
3
4
5
6
7
8
9
1
GATE - Geant4 functions
Func
tion
self
cost
(%)
log10
G4NavigationLevelRep:: G4NavigationLevelRep
isnan
G4ParameterisedNavigation::LevelLocate
_ieee754_log10
G4EMDataSet::FindBinLocation
G4LogLogInterpolation::Calculate
pow
_ieee754_pow
G4EMDataSet::FindValue
GateCompressedVoxelParameterization::ComputeTransformationCLHEP::operator/
GateCompressedVoxelParameterization::ComputeDimensionsG4NavigationLevel::~G4NavigationLevel
G4NavigationLevel::operator=
int_malloc
G4NavigationLevel::G4NavigationLevel
G4ParameterisedNavigation::IdentifyAndPlaceSolidG4Box::Inside
G4Box::SetXHalfLength
G4Box::SetYHalfLength
G4Box::SetZHalfLength
CLHEP::HepJamesRandom::flat
CLHEP::HepRotation::rotateAxes
Total time performance cost for three cases
NC
AT_
para
met
eriz
ed
NCAT
_com
pres
sed
NEMA0
2E+10
4E+10
6E+10
8E+10
1E+11
1E+11
1
simulation cases
Tota
l tim
e co
st (a
bsol
ute
valu
es) NEMA
NCAT_parameterized
NCAT_compressed
Callgrind study Callgrind study Comparison of time-performance costComparison of time-performance cost
Total time cost refers to the total time required for the execution of the simulationAbsolute cost values are presented with arbitrary unitsThe total cost increases by 814% and 517% when a voxelized phantom and a compressed phantom is used respectively
Callgrind study Callgrind study Comparison of time-performance costComparison of time-performance cost
G4LogLogInterpolation::Calculate time performance cost (self) for three cases
NC
AT_
para
met
eriz
ed
NC
AT_
com
pres
sed
NEMA0
500000000
1000000000
1500000000
2000000000
2500000000
3000000000
3500000000
4000000000
4500000000
1
simulation cases
Tim
e co
st (a
bsol
ute
valu
es)
NEMA
NCAT_parameterized
NCAT_compressed
G4EMDataSet::FindValue time performance cost (inclusive) for three cases
NC
AT_
para
met
eriz
ed
NCAT
_com
pres
sed
NEMA0
5000000000
10000000000
15000000000
20000000000
25000000000
30000000000
35000000000
40000000000
45000000000
1
simulation cases
Tim
e co
st (a
bsol
ute
valu
es)
NEMA
NCAT_parameterized
NCAT_compressed
G4EMDataSet::FindValue exhibitsone of the highest time inclusive costs (absolute values) among the G4functions which are called by the G4SteppingManager
Inclusive time cost is the time spent on the function itself and all of its callees
G4LogLogInterpolation::Calculate exhibits one of the highest time self costs (absolute values) among the G4functions which are called by the G4EMDataSet::FindValue
Self time cost is the time spent only on the function itself
Callgrind study Callgrind study Comparison between default and compressed parameterizationComparison between default and compressed parameterization
G4ParameterisedNavigation::LevelLocate time performance cost (inclusive) for three cases
NC
AT_
para
met
eriz
ed
NC
AT_
com
pres
sed
0
5000000000
10000000000
15000000000
20000000000
25000000000
30000000000
1
simulation cases
Tim
e co
st (a
bsol
ute
valu
es) NCAT_parameterized
NCAT_compressed
G4ParameterisedNavigation:: IdentifyAndPlaceSolid time performance cost (inclusive) for three cases
NC
AT_
para
met
eriz
ed
NCAT
_com
pres
sed
0
1000000000
2000000000
3000000000
4000000000
5000000000
6000000000
7000000000
8000000000
9000000000
1
simulation cases
Tim
e co
st (a
bsol
ute
valu
es)
NCAT_parameterized
NCAT_compressed
G4ParameterisedNavigation:: LevelLocate exhibitsone of the highest time inclusive costs (absolute values) among the functions of the G4Navigator class
This function is located at libG4navigator.so and calls most of the functions used by the navigator
G4ParameterisedNavigation:: IdentifyAndPlaceSolid determines the parameterization type applied for each voxel
Compressed voxel implementation adds up the inclusive time cost of this function, because it adjusts the voxel dimensions depending on its material attribute
Callgrind study - Call graphCallgrind study - Call graphNEMA analytical phantomNEMA analytical phantom
G4EMDataSet::FindValue function spends 30.86% of the total cost by completing it’s calls to callees functions and by returning calls made to this function from other caller functions.
Callgrind study - Call graphCallgrind study - Call graphNCAT voxelized phantom (default parameterization)NCAT voxelized phantom (default parameterization)
G4ParameterisedNavigation::LevelLocate function spends 27.88% of the total cost by by completing it’s calls to callees functions and by returning calls made to this function from other caller functions.
Callgrind study - Call graphCallgrind study - Call graphNCAT voxelized phantom (compressed parameterization)NCAT voxelized phantom (compressed parameterization)
G4ParameterisedNavigation::LevelLocate function spends 31.10% of the total cost by by completing it’s calls to callees functions and by returning calls made to this function from other caller functions.
Profiling study conclusionsProfiling study conclusions In the case of voxelized phantomsIn the case of voxelized phantoms
Significant degradation of the G4-GATE performanceSignificant degradation of the G4-GATE performance 8 times slower with default parameterization8 times slower with default parameterization 5 times slower with compressed parameterization5 times slower with compressed parameterization
G4NavigationLevelRep::G4NavigationLevelRep exhibits the highest self time cost G4NavigationLevelRep::G4NavigationLevelRep exhibits the highest self time cost (max. 7.44%)(max. 7.44%)
G4ParameterisedNavigation::LevelLocate exhibits the highest inclusive time cost G4ParameterisedNavigation::LevelLocate exhibits the highest inclusive time cost among functions called by the G4steppingManageramong functions called by the G4steppingManager
Compressed voxel parameterization achieves speed-up of 30-40%Compressed voxel parameterization achieves speed-up of 30-40% Need for more efficient version of stepping manager and navigator optimized for G4 Need for more efficient version of stepping manager and navigator optimized for G4
medical applicationsmedical applications
In the case of analytical phantomsIn the case of analytical phantoms G4EMDataSet::FindValue (located at libG4emlowenergy.so) appears to cause the most G4EMDataSet::FindValue (located at libG4emlowenergy.so) appears to cause the most
important time costimportant time cost Primer caller of the G4 function with the highest combination of self and inclusive time cost Primer caller of the G4 function with the highest combination of self and inclusive time cost
(G4LogLogInterpolation::Calculate) in the case of analytical phantoms(G4LogLogInterpolation::Calculate) in the case of analytical phantoms self: 3.24%, inclusive: 29.4% (for analytical phantoms)self: 3.24%, inclusive: 29.4% (for analytical phantoms) self: 4.37%, inclusive: 33.18% (for analytical phantoms)self: 4.37%, inclusive: 33.18% (for analytical phantoms)
Current VRT Implementations (?)Current VRT Implementations (?) Variance Reduction Techniques (VRTs)Variance Reduction Techniques (VRTs)
Importance samplingImportance sampling Photon splittingPhoton splitting Russian rouletteRussian roulette
Weight window samplingWeight window sampling Weight rouletteWeight roulette ScoringScoring
A number of VRT implementations are already A number of VRT implementations are already available in Geant4, butavailable in Geant4, but Some of the Geant4 classes can be optimized more, Some of the Geant4 classes can be optimized more, Further classes implementing VRTs could be developed Further classes implementing VRTs could be developed
Discussion of possible new VRTs appliedDiscussion of possible new VRTs applied Alternative photon splitting technique for G4Alternative photon splitting technique for G4
If the first photon of the annihilation pair is detected If the first photon of the annihilation pair is detected => second photon splits into multiple photons with equal weights and total => second photon splits into multiple photons with equal weights and total
weight sum of 1weight sum of 1 Degree of splitting depends on the probability of detectionDegree of splitting depends on the probability of detection
Probability is dependent on axial position and emission angleProbability is dependent on axial position and emission angle Therefore geometrical importance sampling is based on axial position and Therefore geometrical importance sampling is based on axial position and
emission angleemission angle Therefore prior knowledge of the geometry of the detector systems is Therefore prior knowledge of the geometry of the detector systems is
requiredrequired High detection probability => photon splits into a small number of High detection probability => photon splits into a small number of
secondary photonssecondary photons Low detection probability => photon splits into a large number of secondary Low detection probability => photon splits into a large number of secondary
photonsphotons Estimation of speed-up factorEstimation of speed-up factor
3 – 4 times increase in efficiency of the application3 – 4 times increase in efficiency of the application Additional photon transport algorithms could be implementedAdditional photon transport algorithms could be implemented
Delta scattering including energy discriminationDelta scattering including energy discrimination Flags implementation Flags implementation
user can easily activate or not the various VRT options depending on the user can easily activate or not the various VRT options depending on the requirements of its applicationrequirements of its application
Suggested actionsSuggested actions Collaboration between GATE and Geant4 VRTs Collaboration between GATE and Geant4 VRTs
workgroupworkgroup Proposed targets of collaborationProposed targets of collaboration
Determination of the existing G4 classes need to be Determination of the existing G4 classes need to be optimizedoptimized
Determination of further G4 classes possibly needed to be Determination of further G4 classes possibly needed to be implementedimplemented
Implementation of GATE-specific classes within GATEImplementation of GATE-specific classes within GATE Validation of the new classes themselves and within Validation of the new classes themselves and within
GATE-Geant4 frameGATE-Geant4 frame Publication of a GATE-specific patch for Geant4Publication of a GATE-specific patch for Geant4
AcknowledgementsAcknowledgements
This work is a part of the project: fastGATEThis work is a part of the project: fastGATE funded by the Greek Secretary of Research and funded by the Greek Secretary of Research and
TechnologyTechnology PartnersPartners
National Technical University of AthensNational Technical University of Athens University of California, Los AngelesUniversity of California, Los Angeles