22
1 Chris A. Cocosco D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig Dept. of Radiology, Medical Physics, GPUs Open New Avenues in Medical MRI UNIVERSITY MEDICAL CENTER FREIBURG

GPUs Open New Avenues in Medical MRI - NVIDIA · 2013. 8. 23. · •Siemens MAGNETOM Trio Tim. •PatLoc gradient insert coil [ Cocosco C.A. et al., ISMRM 2010 #3946 ]. •Additional

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    Chris A. Cocosco

    D. Gallichan, F. Testud, M. Zaitsev, and J. Hennig

    Dept. of Radiology, Medical Physics,

    GPUs Open New Avenues in

    Medical MRI

    UNIVERSITY MEDICAL CENTER FREIBURG

  • 2 C.A. Cocosco, GTC-2012

    Our research group:

    Biomedical Magnetic

    Resonance Imaging (MRI)

    @ University Medical

    Center Freiburg,

    Germany:

    > 50 scientists & PhD

    students

  • 3 C.A. Cocosco, GTC-2012

    B0 gradients (SEMs) for spatial encoding

    +

    -G +G 0

    SEMs: spatial encoding magnetic fields

    “k-space” :

  • 4 C.A. Cocosco, GTC-2012

    B0 gradients (SEMs) for spatial encoding

    Traditional (Linear) +

    + Quadratic (Non-linear)

    -G +G 0

    SEMs: spatial encoding magnetic fields

  • 5 C.A. Cocosco, GTC-2012

    PatLoc:

    • PatLoc = Parallel Acquisition Technique using Localized Gradients

    [ Hennig J. et al., MAGMA 21(1-2):5-14 (2008) ].

    • has the potential to allow:

    (1) higher gradient switching rates while not exceeding the

    Peripheral Nerve Stimulation (PNS) limits;

    (2) novel encoding strategies (e.g. better suited to the anatomy).

  • 6 C.A. Cocosco, GTC-2012

    First ever human PatLoc images:

    [ Schultz G. et al., “Reconstruction of MRI

    Data Encoded with Arbitrarily Shaped,

    Curvilinear, Non-bijective Magnetic Fields”,

    MRM 64(5):1390-1403 (2010) ]

  • 7 C.A. Cocosco, GTC-2012

    Why PatLoc:

    TSE 256x256, TR 5000 ms, slice thickness

    2mm, acquisition time ~3min for 5 slices.

  • 8 C.A. Cocosco, GTC-2012

    Imaging forward model:

    m = E * p

    p : image [NP] NP : number of image pixels

    m : measured data [NT,NC] NT : number of measured (“k-space”) samples

    NC : number of RF receive coils

    E [ NT*NC, NP ]

    Typical magnitudes:

    NT,NC = 256 x 256

    NC = 8

  • 9 C.A. Cocosco, GTC-2012

    Conjugate Gradient Algorithm:

    Conjugate Gradient Algorithm: numerically estimate an image

    consistent with the measured data

    [ Pruessman et al., MRM 2001;46:638-651 ].

    But: no gridding, no FFT !

    Repeat 15…25 times :

    • q = E’ * (E * p)

    1. E * p

    2. E’ * Ep

    • update p

  • 10 C.A. Cocosco, GTC-2012

    Compute-on-demand Implementation:

    E is very large, but: E = E ( Traj, SEM, B1map, B0map )

    Traj [NT, NS]

    SEM [NP, NS]

    B1map [NP, NC]

    B0map [NP]

    where NS = number of SEMs (B0 gradients)

    Foreach( NP )

    Foreach( NT )

    Foreach( NC )

    CUDA implementation:

    • blocks + threads

    • accumulator in shared memory + block reduce

  • 11 C.A. Cocosco, GTC-2012

    Matlab implementation:

    • key to performance: vectorize your code!

    • vector / matrix operations are automatically multi-threaded

    • Parallel Computing Toolbox

    • matlabpool + parfor : loop-level

    • run CUDA ptx kernels

    • both: spmd

  • 12 C.A. Cocosco, GTC-2012

    PatLoc wardware setup:

    • Siemens MAGNETOM Trio Tim.

    • PatLoc gradient insert coil

    [ Cocosco C.A. et al., ISMRM 2010

    #3946 ].

    • Additional set of 3 gradient

    amplifiers; can synchronously

    drive all the available gradients

    simultaneously and

    independently.

  • 13 C.A. Cocosco, GTC-2012

    First PatLoc gradient human coil:

  • 14 C.A. Cocosco, GTC-2012

    Application 1: Higher-dim gradient encoding

    • 4DRIO [ Gallichan D. et al., “Simultaneously driven linear and

    nonlinear spatial encoding fields in

    MRI”, MRM 65(3), 2011 ]

    • NS= 4

    • NP= 320^2

    • NT= 256^2

    • NC= 8

    E ~ 450 GB

  • 15 C.A. Cocosco, GTC-2012

    Throughput CPU vs GPU:

    • quad-socket Intel Xeon Nehalem-EX X7560 with 1024G RAM :

    16 threads : 615s to compute E, 29s / iter

    32 threads : 565s to compute E, 27s / iter

    • dual-socket Intel Xeon Westmere-EP X5690 :

    12 threads : 252s / iter

    • Nvidia Tesla C2075 GPUs

    8.1s / iter

    7s / iter with hardcoded NS

    • 4x Nvidia Tesla C2075 GPUs

    2.3s / iter (3.5x) ( Matlab R2012a ; CUDA 4.1 )

  • 16 C.A. Cocosco, GTC-2012

    Application 2: Ultra-fast imaging

    • “single-shot” imaging

    • Layton et al: “Region-specific

    trajectory design for single-shot

    imaging using linear and nonlinear

    magnetic encoding fields”, ISMRM

    2012.

    • NS= 16 “gradients” (harmonics)

    • NP= 128^2

    • NT= 131^2

    • NC= 8

    E ~ 18 GB

  • 17 C.A. Cocosco, GTC-2012

    Application 2: Ultra-fast imaging

    Use a “Field Camera” : C. Barmet, K. Pruessmann, Inst. for Biomedical

    Eng., University and ETH Zuerich [ Wilm et al, MRM 2011 ]

  • 18 C.A. Cocosco, GTC-2012

    Throughput CPU vs GPU:

    • dual-socket Intel Xeon Westmere-EP X5690, 96GB RAM :

    12 threads : 37s to compute E, 3.7s/iter

    • Nvidia Tesla C2075 GPUs

    0.56s / iter

    • 4x Nvidia Tesla C2075 GPUs

    0.26s / iter

    ( Matlab R2012a ; CUDA 4.1 )

  • 19 C.A. Cocosco, GTC-2012

    What if the subject is ... moving?

    E = E ( Traj, SEM, B1map, B0map )

    • Apply a 3D rigid-body transformation to SEM, B1map, B0map

    for each “segment” of measured data (e.g. 256 segments for a

    256^2 image)

    • Size explosion for pre-computing E, but approachable with the

    compute-on-demand GPU solution.

    • Work in Progress…

    No FFT, like in

    [ Bammer et al. “Augmented generalized SENSE…” MRM2007;57(1):90-102]

  • 20 C.A. Cocosco, GTC-2012

    Conclusions and outlook

    • GPUs open new avenues in medical MRI

    • Faster imaging: shorter sessions, more information

    • Address limitations imposed by: physics, MRI hardware

    technology, human subject

    • Practical R&D process

    • Feasible clinical implementation

    • Wish list: more memory bandwidth, more registers &

    shared memory, or both ;-)

  • 21 C.A. Cocosco, GTC-2012

    Special Thanks to:

    • Research funding: German Federal Ministry of Education and

    Research, grant #13N9208; European Research Council

    Advanced Grant 'OVOC' grant agreement 232908‘.

    • Travel funding: Wissenschaftliche Gesellschaft in Freiburg im

    Breisgau.

    • C. Barmet, K. Pruessmann, (Institute for Biomedical Engineering,

    University and ETH Zuerich, Switzerland).

    • K. Layton (The University of Melbourne, Australia).

    • J. Maclaren, and our colleagues in Medical Physics, Dept. of

    Radiology, University Medical Center Freiburg.

    • Bruker Biospin, Siemens Healthcare.

  • 22 C.A. Cocosco, GTC-2012

    Chris A. Cocosco

    [email protected]

    GPUs Open New Avenues in

    Medical MRI