31
Virtual Photonics, 09/10/2010 Voxel- and mesh-based Monte Carlo methods for 3D photon transport simulations Qianqian Fang, Ph.D. Martinos Center for Biomedical Imaging Massachusetts General Hospital Harvard Medical School Software website: http://mcx.sf.net

Voxel- and mesh-based Monte Carlo methods for 3D photon ...lammp.bli.uci.edu/video/vp2010/ts3.pdf · methods for 3D photon transport simulations Qianqian Fang, Ph.D. Martinos Center

Embed Size (px)

Citation preview

Virtual Photonics, 09/10/2010

Voxel- and mesh-based Monte Carlomethods for 3D photon transport

simulations

Qianqian Fang, Ph.D.Martinos Center for Biomedical Imaging

Massachusetts General HospitalHarvard Medical School

Software website: http://mcx.sf.net

Virtual Photonics, 09/10/2010

Outline

Introduction Pros and Cons of Monte Carlo simulations

MCX: a GPU-accelerated MC software Algorithm flowchart Boundary reflections Atomic vs. non-atomic operations

MMC: a mesh-based MC software Ray-tracing in the Plücker coordinates Multi-threading with OpenMP Validations

Website and resources (http://mcx.sf.net)

Virtual Photonics, 09/10/2010

Modeling photon transport in complex media

Transport equation → Finite element, Finite volume, Monte Carlo

Diffusion → Finite element, finite difference … Software tools:

Finite element: NIRFAST (Dartmouth), TOAST (UCL), Redbird (MGH)

Monte Carlo: MCML: Wang & Jacques (1995), layered media tMCimg: Boas (2002), 3D complex media CUDAMCML: Alerstam (2009), layered media+GPU

Virtual Photonics, 09/10/2010

Pros of MC

Simplicity General media (low-scattering, high absorption, g) Discontinuities Flexible source settings Ease in time-domain simulations Highly parallelizable

Virtual Photonics, 09/10/2010

Cons of MC

(Very) slow in speed Difficulties for modeling complex geometries

Massively Parallel Computing with GPU

Monte Carlo eXtremeMCXLow speed

Mesh-based Monte Carlo MMCModeling

Complex shapes

Virtual Photonics, 09/10/2010

Multi-core processors and GPU

Multi-core is becoming the mainstream Graphics hardware have many cores and optimized

for processing data-parallel tasksPixel shaders

Vertex shaders

Virtual Photonics, 09/10/2010

How is a GPU different from a CPU?

Pros:

Many-core processor High speed internal memory Parallel pipeline Highly optimized

Cons:

No recursion,no function pointers, no host functions

GPU-CPU memory transfer is slow

Portability across hardware

NVIDIA CUDA Programming Guide

Virtual Photonics, 09/10/2010

Project overview

2009 2010

v0.2b v0.2 v0.5b

In the past year

~580 total downloads~90 registered users2840 unique visits from 65 regions4 major releasesMCX: 200 SVN commitsMMC: 110 SVN commitsMCX-CL: 35 SVN commits

MMCv0.2

Based on visitor historyat http://mcx.sf.net

Virtual Photonics, 09/10/2010

Photon transport in a voxelated space

ScatteringLength: L

ScatteringAngle: θ

Incident Position and vector

Accumu. interval

θ = RNG(media[i,j].g) Henyey-GreensteinL = RNG(media[i,j].μs) Exp. distribution

φ = RNG(0~2Pi)θ = RNG(media[i,j].g) L = RNG(media[i,j].μs)

Virtual Photonics, 09/10/2010

GPU-based MC Algorithm Diagram

Virtual Photonics, 09/10/2010

Recording time-resolved solutions

Accumulate photon packets to the corresponding time window

Time window 1

Time window 2

Time window 3

......

Virtual Photonics, 09/10/2010

Random number generators

Mersenne Twister: MT19937 (shared memory) Logistic map and lattices: a coupled chaotic system,

floating-point only RNG

When r=4, x is a random-variable with PDF:

Coupled Lattice:Wagner, 1995

Eric Mills, parallel MT19937 random number generator

Virtual Photonics, 09/10/2010

Handling of Boundary Reflection

Case 1 Case 2

Detecting the exit point: 1. one intersection: find C1 from Pn and done2. two intersections: find C1, if not, find C2 from Pn+1, done3. three intersections: if not C1 and C2, orientation of C3 is uniquely decided

Virtual Photonics, 09/10/2010

Atomic vs. non-atomic operations

Non-atomic operations: global val; newval=val+x; val=newvalue;

Race conditions: when multiple threads read/write a single address, value may become non-consistent.

How bad is the race condition?

Atomic operations: atomicAdd(val,x);

read/add/write in one instruction

Block other threads to avoid racing

Virtual Photonics, 09/10/2010

Validation and comparisons

Semi-infinite medium

Virtual Photonics, 09/10/2010

Modeling complex geometry

Collins adult brain atlas Segmented with FreeSurfer

Virtual Photonics, 09/10/2010

Speed benchmark

Non-atomic: more threads, better speed-up Atomic: peaks around 512~1024 threads, a lot

slower

semi-infinite Collins atlas

Virtual Photonics, 09/10/2010

Recent development

Saving partial-path-length info at detector sites Using shared-memory cache and atomics to

improve accuracy near the source Native matlab mex files Automatic thread/block configuration (experim.) Internal boundary reflection (experimental)

Vanilla MCX Detective MCX Cached MCX Atomic MCX

MCX v0.5

Virtual Photonics, 09/10/2010

Matlab MCX - mcxlab

Sample code: cfg.nphoton=1e7; cfg.vol=uint8(ones(60,60,60));cfg.srcpos=[30 30 1];cfg.srcdir=[0 0 1];cfg.gpuid=1;cfg.autopilot=1;cfg.prop=[0 0 1 1;0.005 1 1 0];cfg.tstart=0;cfg.tend=5e-9;cfg.tstep=5e-10;

cfg.session='testmcx';

cfgs(1)=cfg;cfgs(2)=cfg;cfgs(2).session='testmcx2';

mcxlab(cfgs);

Virtual Photonics, 09/10/2010

MCX Studio – A GUI front-end

Virtual Photonics, 09/10/2010

MCX for heterogeneous platforms

Heterogeneous computing platform

OpenCL

Now supports ATI and nVidia GPUs, as well asMulti-core CPUs.

MCX-CL

OpenMPmulti-threading

Device 1

MCX-CL Device 2

Device N

Gathering/saving

Virtual Photonics, 09/10/2010

Plücker-coordinate-based ray-tracing

A 3D ray R can be represented by R(d,m) Benefit: simplicity in a ray-triangle intersection test:

p0

p1

d=p0- p1

m=p0x p1

w1=d⃗ R⋅m⃗e1+d⃗ e1⋅m⃗Rw2=d⃗ R⋅m⃗e2+d⃗ e2⋅m⃗R

w3=d⃗ R⋅m⃗e3+d⃗ e3⋅m⃗R

Renters ABC←if ∀w i≥0∧∃w i≠0Rexits ABC←if ∀ wi≤0∧∃w i≠0R iscoplanar with ABC←if ∀ wi=0

otherwise , Rmisses ABC

u i=wi /∑ wi

Barycentric coordinates

Virtual Photonics, 09/10/2010

Ray-tracing in a mesh

1. Starting from the element (e0) enclosing the current

point p0, determine scattering vector R (p

0 → p

1)

2. compute the exit point px in e

0

3.a if ||R||>||p0p

x||, set p

0 ← p

x, deposit energy-loss to e

0,

||R||←||R||-||p0p

x||, set e

0 to the

face-neighbor element, repeat from 2

3.b if ||R||<=||p0p

x||, set p

0 ← p

1, deposit

energy-loss to e0, repeat from 1

e0

R

p1px

p0

Virtual Photonics, 09/10/2010

Parallelization with OpenMP

#pragma omp parallel private(ran0,ran1,threadid){#ifdef _OPENMP

threadid=omp_get_thread_num();#endif

rng_init(ran0,ran1,(unsigned int *)&(cfg.seed),threadid);

/*launch photons*/#pragma omp for reduction(+:Eabsorb)

for(i=0;i<cfg.nphoton;i++){Eabsorb+=onephoton(i,&plucker,&mesh,&cfg,rtstep,ran0,ran1);#pragma omp atomic ncomplete++;

if((cfg.debuglevel & dlProgress) && threadid==0)mcx_progressbar(ncomplete,cfg.nphoton,&cfg);

}}

Virtual Photonics, 09/10/2010

Validations

Mesh: 63820 nodes, 375753 elements, run-time: 23 min for 3e7 photons with 4 E5520 CPU cores (21.5 photons/ms)

Virtual Photonics, 09/10/2010

Performance assessment

Meshed by iso2mesh toolbox

Virtual Photonics, 09/10/2010

Test with a complex geometry

Human brain atlas: Collins et al, 1998

http://mcx.sf.net/cgi-bin/index.cgi?MMC/CollinsAtlasMesh

Virtual Photonics, 09/10/2010

Simulation results

Virtual Photonics, 09/10/2010

MMC Software

MMC V0.2 was just released (Linux/Windows) Collins adult brain atlas mesh version 1 Sphere-Diffusion matlab toolbox for the analytical

solution of a spherical inclusion

Virtual Photonics, 09/10/2010

Website and resources Both MCX and MMC are open-source software

Homepage: http://mcx.sf.net

Binary (windows/linux/mac) and source code for download

Paper published online from Optics Express/Biomed. Opt. Express

Virtual Photonics, 09/10/2010

Future plans

High order element support in MMC GPU-accelerated MMC Validations on internal reflections More selections in source forms

Questions ?

→ MMCX