Flexible X-Ray Image Processing on GPUs Matthias Vogelgesang, Suren Chilingaryan and Andreas Kopmann X-Ray Tomography at Synchrotrons Image Processing Toolkit X-ray tomography is a non-destructive technique to analyze and visualize the inner structure of organisms and materials in three and four dimensions. • Core written in C • Uses hardware-dependent OpenCL kernels • Python and JS bindings • Implicit buffer management • Multi-threading and implicit synchronisation using asychronous queues • Automatic node-to-GPU mapping using heuristics from gi.repository import Ufo g = Ufo.Graph() for f in ['reader', 'writer', \ 'fft', 'ifft', 'ramp']: globals()[f] = g.get_filter(f) bp = g.get_filter('backproject') fft.set_properties(dimensions=1) reader.connect_to(fft) fft.connect_to(ramp) ramp.connect_to(ifft) ifft.connect_to(bp) bp.connect_to(writer) g.run() High pixel resolutions, bit depths and frame rates of state-of- the-art cameras increase the size of a single data set to several gigabytes. In order to process sequences of recon-structed volumes during acquisition, GPU processing is in-evitable. Our software toolkit is used for pre-processing, image reconstruction and post-processing in online and offline environ- ments. Most algorithms can be expressed as a composition of smaller sub-operations such as convolutions, filters and complex arithmetics. Thus, we map each of the processing stages to a computation graph representing the full algorithm. Each node in the graph is connected using a queue that is used to push processed data to its successor and scheduled according to its workload. Some of its key features are: Reference Chilingaryan, Mirone, Hammersley, Ferrero, Helfen, Kopmann, dos Santos Rolo and Vagovic. “A GPU-Based Architecture for Real-Time Data Assess- ment at Synchrotron Experiments” IEEE Transactions on Nuclear Science, vol. 58, pp. 1447–1455, Aug. 2011. Results Tesla S1070 0 200 400 600 800 1000 1200 1099 269 75 64 38 Reconstruction time in seconds 2x GTX 295 2x Xeon E5472 1x GTX 280 4x GTX 580 1 2 3 4 Number of GPUs 200 400 600 800 GFLOPS Scalability of Back-Projection on Tesla S1070 Time for reconstructing an 11 GB 3D Volume from 2000 Projections (~ 24 GB) Acquisition of Projections Example: Reconstruction Graph Sinogram Generation FFT Filter IFFT Back-Projection on one or more GPUs Storage Segmentation/ Meshing Except for acquisition and storage, each node is executed on one of the available GPUs according to a heuristic. Hardware Setup 100 fps, 5.5 Megapixel 330 fps, 2.2 Megapixel Sychrotron Radiation 10–65 keV Energy range Data Processing w/ up to 6 GPUs Hadoop Storage Cluster 2 PB disk + 2.5 PB tape Online evaluation allows us to adjust motor and acquisition parameters True linearity

Matthias Vogelgesang, Suren Chilingaryan and Andreas Kopmann · Matthias Vogelgesang, Suren Chilingaryan and Andreas Kopmann X-Ray Tomography at Synchrotrons Image Processing Toolkit

Download PDF Report

Upload
others
View
9
Download
0

Embed Size (px)

Citation preview

Page 1: Matthias Vogelgesang, Suren Chilingaryan and Andreas Kopmann · Matthias Vogelgesang, Suren Chilingaryan and Andreas Kopmann X-Ray Tomography at Synchrotrons Image Processing Toolkit

Flexible X-Ray Image Processing on GPUsMatthias Vogelgesang, Suren Chilingaryan and Andreas Kopmann

X-Ray Tomography at Synchrotrons

Image Processing Toolkit

X-ray tomography is a non-destructive technique to analyze and visualize the inner structure of organisms and materials in three and four dimensions.

• Core written in C• Uses hardware-dependent OpenCL kernels• Python and JS bindings• Implicit buffer management• Multi-threading and implicit synchronisation using asychronous queues • Automatic node-to-GPU mapping using heuristics

from gi.repository import Ufo

g = Ufo.Graph()for f in ['reader', 'writer', \ 'fft', 'ifft', 'ramp']: globals()[f] = g.get_filter(f)

bp = g.get_filter('backproject')fft.set_properties(dimensions=1)reader.connect_to(fft)fft.connect_to(ramp)ramp.connect_to(ifft)ifft.connect_to(bp)bp.connect_to(writer)

g.run()

High pixel resolutions, bit depths and frame rates of state-of-the-art cameras increase the size of a single data set to several gigabytes. In order to process sequences of recon-structed volumes during acquisition, GPU processing is in-evitable.

Our software toolkit is used for pre-processing, image re-construction and post-processing in online and offline environ-ments. Most algorithms can be expressed as a composition of smaller sub-operations such as convolutions, filters and complex arithmetics. Thus, we map each of the processing stages to a computation graph representing the full algorithm. Each node in the graph is connected using a queue that is used to push processed data to its successor and scheduled according to its workload.

Some of its key features are:

ReferenceChilingaryan, Mirone, Hammersley, Ferrero, Helfen, Kopmann, dos Santos Rolo and Vagovic. “A GPU-Based Architecture for Real-Time Data Assess-ment at Synchrotron Experiments” IEEE Transactions on Nuclear Science, vol. 58, pp. 1447–1455, Aug. 2011.

Results

Tesla S1070

0 200 400 600 800 1000 1200

1099

269

Reconstruction time in seconds

2x GTX 295

2x Xeon E5472

1x GTX 280

4x GTX 580

1 2 3 4

Number of GPUs