Upload
duongdan
View
219
Download
5
Embed Size (px)
Citation preview
Virtual Photonics, 09/10/2010
Voxel- and mesh-based Monte Carlomethods for 3D photon transport
simulations
Qianqian Fang, Ph.D.Martinos Center for Biomedical Imaging
Massachusetts General HospitalHarvard Medical School
Software website: http://mcx.sf.net
Virtual Photonics, 09/10/2010
Outline
Introduction Pros and Cons of Monte Carlo simulations
MCX: a GPU-accelerated MC software Algorithm flowchart Boundary reflections Atomic vs. non-atomic operations
MMC: a mesh-based MC software Ray-tracing in the Plücker coordinates Multi-threading with OpenMP Validations
Website and resources (http://mcx.sf.net)
Virtual Photonics, 09/10/2010
Modeling photon transport in complex media
Transport equation → Finite element, Finite volume, Monte Carlo
Diffusion → Finite element, finite difference … Software tools:
Finite element: NIRFAST (Dartmouth), TOAST (UCL), Redbird (MGH)
Monte Carlo: MCML: Wang & Jacques (1995), layered media tMCimg: Boas (2002), 3D complex media CUDAMCML: Alerstam (2009), layered media+GPU
Virtual Photonics, 09/10/2010
Pros of MC
Simplicity General media (low-scattering, high absorption, g) Discontinuities Flexible source settings Ease in time-domain simulations Highly parallelizable
Virtual Photonics, 09/10/2010
Cons of MC
(Very) slow in speed Difficulties for modeling complex geometries
Massively Parallel Computing with GPU
Monte Carlo eXtremeMCXLow speed
Mesh-based Monte Carlo MMCModeling
Complex shapes
Virtual Photonics, 09/10/2010
Multi-core processors and GPU
Multi-core is becoming the mainstream Graphics hardware have many cores and optimized
for processing data-parallel tasksPixel shaders
Vertex shaders
Virtual Photonics, 09/10/2010
How is a GPU different from a CPU?
Pros:
Many-core processor High speed internal memory Parallel pipeline Highly optimized
Cons:
No recursion,no function pointers, no host functions
GPU-CPU memory transfer is slow
Portability across hardware
NVIDIA CUDA Programming Guide
Virtual Photonics, 09/10/2010
Project overview
2009 2010
v0.2b v0.2 v0.5b
In the past year
~580 total downloads~90 registered users2840 unique visits from 65 regions4 major releasesMCX: 200 SVN commitsMMC: 110 SVN commitsMCX-CL: 35 SVN commits
MMCv0.2
Based on visitor historyat http://mcx.sf.net
Virtual Photonics, 09/10/2010
Photon transport in a voxelated space
ScatteringLength: L
ScatteringAngle: θ
Incident Position and vector
Accumu. interval
θ = RNG(media[i,j].g) Henyey-GreensteinL = RNG(media[i,j].μs) Exp. distribution
φ = RNG(0~2Pi)θ = RNG(media[i,j].g) L = RNG(media[i,j].μs)
Virtual Photonics, 09/10/2010
Recording time-resolved solutions
Accumulate photon packets to the corresponding time window
Time window 1
Time window 2
Time window 3
......
Virtual Photonics, 09/10/2010
Random number generators
Mersenne Twister: MT19937 (shared memory) Logistic map and lattices: a coupled chaotic system,
floating-point only RNG
When r=4, x is a random-variable with PDF:
Coupled Lattice:Wagner, 1995
Eric Mills, parallel MT19937 random number generator
Virtual Photonics, 09/10/2010
Handling of Boundary Reflection
Case 1 Case 2
Detecting the exit point: 1. one intersection: find C1 from Pn and done2. two intersections: find C1, if not, find C2 from Pn+1, done3. three intersections: if not C1 and C2, orientation of C3 is uniquely decided
Virtual Photonics, 09/10/2010
Atomic vs. non-atomic operations
Non-atomic operations: global val; newval=val+x; val=newvalue;
Race conditions: when multiple threads read/write a single address, value may become non-consistent.
How bad is the race condition?
Atomic operations: atomicAdd(val,x);
read/add/write in one instruction
Block other threads to avoid racing
Virtual Photonics, 09/10/2010
Modeling complex geometry
Collins adult brain atlas Segmented with FreeSurfer
Virtual Photonics, 09/10/2010
Speed benchmark
Non-atomic: more threads, better speed-up Atomic: peaks around 512~1024 threads, a lot
slower
semi-infinite Collins atlas
Virtual Photonics, 09/10/2010
Recent development
Saving partial-path-length info at detector sites Using shared-memory cache and atomics to
improve accuracy near the source Native matlab mex files Automatic thread/block configuration (experim.) Internal boundary reflection (experimental)
Vanilla MCX Detective MCX Cached MCX Atomic MCX
MCX v0.5
Virtual Photonics, 09/10/2010
Matlab MCX - mcxlab
Sample code: cfg.nphoton=1e7; cfg.vol=uint8(ones(60,60,60));cfg.srcpos=[30 30 1];cfg.srcdir=[0 0 1];cfg.gpuid=1;cfg.autopilot=1;cfg.prop=[0 0 1 1;0.005 1 1 0];cfg.tstart=0;cfg.tend=5e-9;cfg.tstep=5e-10;
cfg.session='testmcx';
cfgs(1)=cfg;cfgs(2)=cfg;cfgs(2).session='testmcx2';
mcxlab(cfgs);
Virtual Photonics, 09/10/2010
MCX for heterogeneous platforms
Heterogeneous computing platform
OpenCL
Now supports ATI and nVidia GPUs, as well asMulti-core CPUs.
MCX-CL
OpenMPmulti-threading
Device 1
MCX-CL Device 2
Device N
Gathering/saving
Virtual Photonics, 09/10/2010
Plücker-coordinate-based ray-tracing
A 3D ray R can be represented by R(d,m) Benefit: simplicity in a ray-triangle intersection test:
p0
p1
d=p0- p1
m=p0x p1
w1=d⃗ R⋅m⃗e1+d⃗ e1⋅m⃗Rw2=d⃗ R⋅m⃗e2+d⃗ e2⋅m⃗R
w3=d⃗ R⋅m⃗e3+d⃗ e3⋅m⃗R
Renters ABC←if ∀w i≥0∧∃w i≠0Rexits ABC←if ∀ wi≤0∧∃w i≠0R iscoplanar with ABC←if ∀ wi=0
otherwise , Rmisses ABC
u i=wi /∑ wi
Barycentric coordinates
Virtual Photonics, 09/10/2010
Ray-tracing in a mesh
1. Starting from the element (e0) enclosing the current
point p0, determine scattering vector R (p
0 → p
1)
2. compute the exit point px in e
0
3.a if ||R||>||p0p
x||, set p
0 ← p
x, deposit energy-loss to e
0,
||R||←||R||-||p0p
x||, set e
0 to the
face-neighbor element, repeat from 2
3.b if ||R||<=||p0p
x||, set p
0 ← p
1, deposit
energy-loss to e0, repeat from 1
e0
R
p1px
p0
Virtual Photonics, 09/10/2010
Parallelization with OpenMP
#pragma omp parallel private(ran0,ran1,threadid){#ifdef _OPENMP
threadid=omp_get_thread_num();#endif
rng_init(ran0,ran1,(unsigned int *)&(cfg.seed),threadid);
/*launch photons*/#pragma omp for reduction(+:Eabsorb)
for(i=0;i<cfg.nphoton;i++){Eabsorb+=onephoton(i,&plucker,&mesh,&cfg,rtstep,ran0,ran1);#pragma omp atomic ncomplete++;
if((cfg.debuglevel & dlProgress) && threadid==0)mcx_progressbar(ncomplete,cfg.nphoton,&cfg);
}}
Virtual Photonics, 09/10/2010
Validations
Mesh: 63820 nodes, 375753 elements, run-time: 23 min for 3e7 photons with 4 E5520 CPU cores (21.5 photons/ms)
Virtual Photonics, 09/10/2010
Test with a complex geometry
Human brain atlas: Collins et al, 1998
http://mcx.sf.net/cgi-bin/index.cgi?MMC/CollinsAtlasMesh
Virtual Photonics, 09/10/2010
MMC Software
MMC V0.2 was just released (Linux/Windows) Collins adult brain atlas mesh version 1 Sphere-Diffusion matlab toolbox for the analytical
solution of a spherical inclusion
Virtual Photonics, 09/10/2010
Website and resources Both MCX and MMC are open-source software
Homepage: http://mcx.sf.net
Binary (windows/linux/mac) and source code for download
Paper published online from Optics Express/Biomed. Opt. Express