Multi-Resolution Volume Rendering of Large Medical Data ...liu.diva-portal.org/smash/get/diva2:17428/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings Universitet SE-601 74 Norrköping, Sweden 601 74 Norrköping

LITH-ITN-MT-EX--07/056--SE

Multi-Resolution VolumeRendering of Large Medical Data

Sets on the GPUAjden Towfeek

2007-12-20

LITH-ITN-MT-EX--07/056--SE

Multi-Resolution VolumeRendering of Large Medical Data

Sets on the GPUExamensarbete utfört i medieteknik

vid Tekniska Högskolan vidLinköpings unversitet

Ajden Towfeek

Handledare Fredrik HällExaminator Anders Ynnerman

Norrköping 2007-12-20

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Ajden Towfeek

Abstract

Volume rendering techniques can be powerful tools when visualizing medicaldata sets. The characteristics of being able to capture 3-D internal structuresmake the technique attractive. Scanning equipment is producing medicalimages, with rapidly increasing resolution, resulting in heavily increased sizeof the data set. Despite the great amount of processing power CPUs deliver,the required precision in image quality can be hard to obtain in real-timerendering. Therefore, it is highly desirable to optimize the rendering process.

Modern GPUs possess much more computational power and is availablefor general purpose programming through high level shading languages. Ef-ficient representations of the data are crucial due to the limited memoryprovided by the GPU. This thesis describes the theoretical background andthe implementation of an approach presented by Patric Ljung, Claes Lund-strom and Anders Ynnerman at Linkoping University. The main objective isto implement a fully working multi-resolution framework with two separatepipelines for pre-processing and real-time rendering, which uses the GPU tovisualize large medical data sets.

Acknowledgements

I would like to give especial thanks to Fredrik Hall, my supervisor at Sectra-Imtec AB and his colleague Aron Ernvik, for great support and many prof-itable discussions that made this thesis what it is. Thanks to Patric Ljung,whose Ph.D. thesis was a big inspiration and for his helpful hints and dis-cussions. Thanks to my academic supervisor Anders Ynnerman and myopponent Anders Hagvall for giving valuable feedback on the report. Also alot of thanks to my family and all my friends for support by just being there.Finally, a special thanks to my fiance for all her love and support.

Abbreviations

CPU Central Processing Unit

GPU Graphics Processing Unit

CT Computed Tomography

MRI Magnetic Resonance Imaging

FPS Frames Per Second

TF Transfer Function

LOD Level of Detail

II Interblock Interpolation

NB Nearest Block

SW Software

PACS Picture Archiving and Communications System

IDS Image Display System, a Sectra Workstation

Contents

List of Figures iii

List of Tables v

1 Introduction 11.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Outline of Report . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Reader Prerequisites . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 42.1 Medical Imaging . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Modalities . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Medical Volume Visualization . . . . . . . . . . . . . . . . . . 5

2.2.1 Volume Data . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Transfer Functions . . . . . . . . . . . . . . . . . . . . 62.2.3 Direct Volume Rendering . . . . . . . . . . . . . . . . . 8

Volume Rendering Integral . . . . . . . . . . . . . . . . 8Ray Casting . . . . . . . . . . . . . . . . . . . . . . . . 9Alpha Blending . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Graphics Processing Unit . . . . . . . . . . . . . . . . . . . . . 102.3.1 The Graphics Pipeline . . . . . . . . . . . . . . . . . . 102.3.2 The Programmable Graphics Pipeline . . . . . . . . . . 11

Vertex Shaders . . . . . . . . . . . . . . . . . . . . . . 12Fragment Shaders . . . . . . . . . . . . . . . . . . . . . 12

3 State-of-the-art Direct Volume Rendering 133.1 Multi-resolution Volumes . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Flat and Hierarchical Blocking . . . . . . . . . . . . . . 143.2 Pre-processing Pipeline . . . . . . . . . . . . . . . . . . . . . . 15

3.2.1 Volume Subdivision and Blocking . . . . . . . . . . . . 15

i

CONTENTS ii

3.3 Multi-resolution Volume Rendering . . . . . . . . . . . . . . . 173.3.1 Sampling of Multi-resolution Volumes . . . . . . . . . . 173.3.2 Intrablock Volume Sampling . . . . . . . . . . . . . . . 183.3.3 Interblock Interpolation . . . . . . . . . . . . . . . . . 18

4 Implementation 214.1 Application Environment . . . . . . . . . . . . . . . . . . . . . 214.2 Extending The Class Hierarchy . . . . . . . . . . . . . . . . . 21

4.2.1 Packing The Volume Texture . . . . . . . . . . . . . . 224.2.2 Meta-data . . . . . . . . . . . . . . . . . . . . . . . . . 234.2.3 Level-of-Detail Selection . . . . . . . . . . . . . . . . . 23

4.3 Rendering Pipeline . . . . . . . . . . . . . . . . . . . . . . . . 254.3.1 GPU-based Ray Casting . . . . . . . . . . . . . . . . . 254.3.2 Sampling Algorithm . . . . . . . . . . . . . . . . . . . 264.3.3 Interpolation Algorithm . . . . . . . . . . . . . . . . . 27

5 Results 285.1 Data Sets and Test Environment . . . . . . . . . . . . . . . . 285.2 Volume Pre-Processing . . . . . . . . . . . . . . . . . . . . . . 295.3 Nearest Block Sampling . . . . . . . . . . . . . . . . . . . . . 305.4 Interblock Interpolation . . . . . . . . . . . . . . . . . . . . . 32

6 Discussion 356.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . 366.1.2 Implemented Methods . . . . . . . . . . . . . . . . . . 36

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Bibliography 39

List of Figures

2.1 Illustration of volume data as voxels. . . . . . . . . . . . . . . 62.2 Direct volume rendering with different TF-settings. . . . . . . 7

(a) Bone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7(b) Skin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7(c) Dense bone . . . . . . . . . . . . . . . . . . . . . . . . . 7(d) Air . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Raycasting from the viewer through a pixel on the screen. . . 92.4 The graphics pipeline. . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Illustration of a flat multi-resolution blocking grid. . . . . . . . 153.2 Comparison of hierarchical and flat blocking. . . . . . . . . . . 163.3 Illustration of block neighborhood and boundaries. . . . . . . 18

(a) Sample boundary . . . . . . . . . . . . . . . . . . . . . . 18(b) Eight block neighborhood . . . . . . . . . . . . . . . . . 18

4.1 The class hierarchy extension. . . . . . . . . . . . . . . . . . . 224.2 Texture packing and lookup coordinates. . . . . . . . . . . . . 24

(a) Original texture. . . . . . . . . . . . . . . . . . . . . . . 24(b) Packed texture. . . . . . . . . . . . . . . . . . . . . . . . 24(c) Scale factors and new coordinates. . . . . . . . . . . . . 24

5.1 Comparison of 512x512x512 data sets with NB sampling. . . . 31(a) NB Sampling . . . . . . . . . . . . . . . . . . . . . . . . 31(b) SW Based . . . . . . . . . . . . . . . . . . . . . . . . . . 31(c) NB Sampling . . . . . . . . . . . . . . . . . . . . . . . . 31(d) SW Based . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2 Comparison of 512x512x512 data sets with II sampling. . . . . 33(a) II Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 33(b) SW Based . . . . . . . . . . . . . . . . . . . . . . . . . . 33(c) II Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 33(d) SW Based . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iii

LIST OF FIGURES iv

5.3 2% texture usage for II and NB, and 100% for SW. . . . . . . 34(a) NB Sampling . . . . . . . . . . . . . . . . . . . . . . . . 34(b) II Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 34(c) SW Full Resolution . . . . . . . . . . . . . . . . . . . . . 34

5.4 Comparison of single- and multi-resolution rendering. . . . . . 34(a) Single-resolution . . . . . . . . . . . . . . . . . . . . . . 34(b) Multi-resolution . . . . . . . . . . . . . . . . . . . . . . . 34(c) Full Resolution . . . . . . . . . . . . . . . . . . . . . . . 34

List of Tables

5.1 Time performance measured in seconds and reduction ratio. . 295.2 FPS for II and NB on GeForce 8800 GTX and block count . . 305.3 FPS for II and NB on ATI X1950 Pro and block count . . . . 30

v

Chapter 1

Introduction

This chapter introduces the reader to the thesis. The first section givesa short overview of the problems that are addressed, the following sectionsummarizes the main objectives, followed by an outline of the report, finallyrecommended prerequisites are given.

1.1 Problem Description

Supporting volume rendering techniques to visualize medical image stacks hasbecome increasingly important for Sectra PACS. Capturing internal struc-tures of blood vessels and the skeleton provides the user with valuable infor-mation during diagnosis. Furthermore, the ability of rotating, zooming andchanging the point of view in 3-D helps other end-users than radiologists tounderstand the shapes of the organs and their location.

Scanning equipment is producing medical images, with rapidly increasingresolution, resulting in heavily increased size of the data set. Despite thegreat amount of processing power CPUs deliver, the required precision inimage quality can be hard to obtain in real-time rendering. Therefore, it ishighly desirable to optimize the rendering process.

Modern GPUs possess much more computational power and is availablefor general purpose programming through high level shading languages. Ef-ficient representations of the data are crucial due to the limited memory pro-vided by the GPU. This thesis implements an approach presented by PatricLjung, Claes Lundstrom and Anders Ynnerman at Linkoping University [1].The main objective is to implement a fully working multi-resolution frame-work with two separate pipelines for pre-processing and real-time rendering,which uses the GPU to visualize large medical data sets.

Problems like how to interpolate between arbitrary resolutions in multi-

1

Introduction 2

resolution volumes are areas of research, this thesis will implement an in-terblock interpolation technique developed by Ljung et al. [1]. There havebeen several approaches proposed on how to select the LOD, this thesis willpresent a new and fairly simple method that bases the selection on the TF.However, the main focus with this thesis is to integrate a multi-resolution ren-dering framework in the system IDS7, developed by Sectra-Imtec AB. Thisthesis will extend the existing volume rendering framework to support multi-resolution representations. To make this possible a pre-processing pipelinemust be implemented and modifications need to be made in the real-timerendering loop. Overall this thesis describes the theoretical background andthe implementation of an approach presented by Ljung et al. [1] and its ca-pability to integrate with IDS7.

1.2 Thesis Objectives

Given the stated problem description in the previous section, the objectivesof this thesis are the following:

• Evaluate existing technique for volume rendering in IDS7.

• Implement a multi-resolution rendering framework [1].

• Integrate and test the methods in IDS7.

1.3 Outline of Report

Chapter 2 gives an introduction to the area of medical imaging in terms ofhow image data is produced and visualized. Furthermore, a description ofvolume rendering techniques is given. Chapter 3 presents methods for im-proving direct volume rendering proposed by Ljung et al. [1] e.g. efficientvolume texture packing and multi-resolution volume rendering. The imple-mentation of the methods in chapter 3 is described in chapter 4. Chapter5 presents the results of the proposed methods by Ljung et al. and theirimplementations. Finally, chapter 6 states the conclusions drawn from pre-vious work made in the area of research. Also fulfillment of the presentedstate-of-the-art methods are discussed with respect to their objectives andhow they can be further developed.

Introduction 3

1.4 Reader Prerequisites

To fully understand and appreciate the contents of this thesis, good knowl-edge in computer graphics, image processing and analysis are recommended.Furthermore, the concept of volume rendering and its use in medical imagingwill also help understanding the contents of this report.

Chapter 2

Background

This chapter introduces the reader to the area of medical imaging and abrief overview is given of techniques available for visualizing volume datasets. The first section will describe how medical image data is producedand the characteristics of the data. Furthermore, tools for medical imagevolume visualization are presented, i.e. volume rendering techniques thatare the foundation for the objectives of this thesis. Finally, a more thoroughpresentation of the GPU is given. The last section presents the programmingability on GPUs nowadays and how this can be used for efficient volumerendering and optimization.

2.1 Medical Imaging

Medical imaging is a powerful tool for diagnostics in medicine. The tradi-tional way of producing medical images is to use a flat-panel x-ray detectorto visualize the bone structure [2]. The films produced by the equipmentare then put in front of a light screen for analysis. Nowadays, this techniqueis being digitalized and many radiology departments produce digital imagesand uses PACS instead. Full body virtual autopsy is for instance one field ofuse [3]. The cost and time efficiency gain is convincing.

2.1.1 Modalities

The apparatus that produces medical images is called a modality. The pri-mary modality types are Computed Tomography (CT), Magnetic ResonanceImaging (MRI), Ultrasound, PET, Mammography, Computed- and DigitalRadiography (CR/DR). The apparatus that are of interest for this thesis areCT and MRI scanners since they have the ability to produce 3-D medical

4

Background 5

image data sets [2].Computed Tomography, also known as CAT (Computed Axial To-

mography) scanning is one of the most commonly used modality types andcan produce 3-D data sets. A CT modality uses x-rays and has the charac-teristics of capturing materials that absorb radiation, such as bone.

The images produced by the CT scanner are slices with a normal in thedirection of the main axis going through the body. These are produced byletting a tube rotate in a spiral around the patient emitting x-ray radiationand measuring the amount of attenuation on the opposite side. A slice isproduced each time the tube has rotated 360 degrees.

Magnetic Resonance Imaging can also produce 3-D image data sets.The produced images are somewhat noisy and also this technique has theability of separating different kinds of soft tissue.

The patient being scanned is surrounded by a strong magnetic field. Radiopulses are emitted with a frequency that puts the hydrogen molecules in thebody into a high energy state. Extra energy will then be emitted when thepulse is turned off and the molecules go back to their normal state. Whenthe modality detects the time it took for the emission to decompose the finalimage is produced.

2.2 Medical Volume Visualization

The images that the scanners produce need to be visualized. The moststraightforward way of doing so is to view each image slice one after another,yielding a volume. Taking it one step further we can produce a slice with anarbitrary normal direction by interpolating its values from the volume data.This plane can then be moved along its normal, giving the opportunity forinteractively exploring the content of the data set. The technique is calledMulti Planar Reconstruction (MPR); an image can be produced from anyplane in the volume data.

2.2.1 Volume Data

A pixel in an image is some how the 2-D equivalent of a voxel in volume data.Discrete volume data sets can be thought of as a three-dimensional array ofcubic elements, as showed in figure 2.1. Each small cube represents a voxeland together they form the volume data.

It is not correct to assume that the entire voxel has the same value.Instead, we should think of the voxel as a small volume in which we knowthe density value at an infinitesimal point in its center [2]. The distance

Background 6

Figure 2.1: Illustration of volume data as voxels.

between voxel centers then becomes the sampling distance. The Nyquisttheorem states that when sampling a continuous signal we need to samplewith at least twice the highest frequency in the signal [2].

Ideal reconstruction according to theory requires the convolution of thesample points with a sinc function in the spatial domain but this is compu-tationally too expensive. Instead, when reconstructing a continuous signalfrom an array of voxels in practice the sinc filter is usually replaced by eithera box filter or a tent filter.

Convolution with the box filter gives the nearest neighbor interpolation,which results in a rather blocky appearance due to the sharp discontinuitiesbetween neighboring voxels. A trilinear interpolation is achieved by convo-lution with a 3-D tent filter, resulting in a nice interpolation with a goodtrade-off between cost and smoothness.

2.2.2 Transfer Functions

Density values from the scans must be mapped into some form of opticalproperty in order to be visualized. Most often the desirable result is notto let all voxels in the volume contribute to the final image. Instead, a TFmaps the value of a voxel to optical properties, such as color and opacity [4].Interaction is allowed in this step to let the user decide what to visualize. Itis unlikely that one would like to see the air surrounding the body, but if thedensity values for air are mapped to opaque colors they will be visualized.The mapping between data and optical properties is often referred to as clas-sification. Lots of research have been made in the area of designing TFs and

Background 7

it is a quite complex task. Therefore, presets are often used, where the usersimply can choose what to visualize, e.g. bone, skin or dense bone, insteadof mapping the density values by hand, see figure 2.2.

(a) Bone (b) Skin

(c) Dense bone (d) Air

Figure 2.2: Direct volume rendering with different TF-settings.

Background 8

2.2.3 Direct Volume Rendering

Direct volume rendering is the process of generating images directly from avolumetric data set. Volume rendering operates directly on the data valuesof each voxel. Each voxel’s density value can be represented with a color, asdescribed in the previous section, yielding a resulting color image.

Many methods exist for performing volume rendering. Ray casting is adirect volume rendering method, meaning that rendering is performed byevaluating an optical model. In comparison, surface rendering uses geomet-rical primitives to describe features in the data set. Evaluating an opticalmodel implies assigning optical characteristics to the density values of eachvoxel in the volume data. This mapping of density values to colors are usu-ally done with TFs, and is often referred to as classification, as described inthe previous section.

Volume Rendering Integral

When performing the actual rendering, the volume rendering integral needsto be evaluated [2]. The integral numerically approximates rays from a lightsource to the eye of a viewer. Evaluation is considering both emission andabsorption along the ray and the integral is continuous in its definition. Incomputer graphics visual correctness is more important than the physical,therefore most of the volume rendering integrals only account for absorptionand reflection. Equation (2.1) is known as the low-albedo volume renderingintegral [5].

Iλ(x, r) =

∫ L

0

Cλ(s)µ(s)e−∫ s0 µ(t)dtds (2.1)

The equation calculates the intensity, I, of a certain wavelength, λ, thatis received at position x on the image plane from direction r. C representsthe material properties that determines how the light is reflected, µ is thedensity of the particles and L is the length of the ray. The discrete versionof the integral follows as:

Iλ(x, r) =

L/∆s∑i=0

Cλ(si)α(si) ·i−1∏j=0

(1− α(sj)) (2.2)

The material properties, Cλ, togheter with the transparency α producea mapping between the characteristics of the volume data and the visualproperties.

Background 9

Figure 2.3: Raycasting from the viewer through a pixel on the screen.

Ray Casting

There are generally two approaches for evaluating the volume rendering in-tegral. The object-order approach determines how each voxel in the volumewill contribute to the rendered image. In opposite, image-order renderingdetermines how much the volume contributes to each pixel in the final im-age [1].

It is only natural to assume that the ray tracing starts at the light sourcesince light source emit rays of light. However, even if it is theoreticallypossible, most of the rays never reach the observer, yielding in waste ofcomputational power. Instead, rays are cast from the observer into the scene,as shown in figure 2.3.

Ray casting is a simplification of ray tracing in the sense that it doesnot allow for any interactions. For each pixel in the image a ray is shotinto the volume, along this ray samples are taken. A simple optimizationis to implement early ray termination, integrating in a front-to-back order.The idea of this algorithm is to terminate the integration if it at some pointreaches full opacity, since samples lying behind will not contribute to thefinal pixel [2].

Alpha Blending

Alpha blending is a technique for blending two semi-transparent colors, e.g.looking through a pair of sunglasses causes the colors to look different, be-cause they are blended with the color of the glass [5]. Blending is usuallydone with the following equation:

Background 10

Cout = Csrc · αsrc + (1− αsrc) · Cdst (2.3)

Cout is the color that is displayed, Csrc and αsrc are the RGB and alphavalues of the color being blended and Cdst is the present color before theblending [2]. Alpha blending is useful when solving the volume renderingintegral numerically and introducing indices in equation (2.3) gives:

C′

i = Ci · αi + (1− αi) · C′

i+1 (2.4)

Stepping the equation from n − 1 to 0, n being the number of samples,solves the discrete version of the integral numerically. Since the iteration is aback-to-front order this can be very inefficient, hence the following equationswill solve the alpha blending in a front-to-back order [2].

C′

i = C′

i−1 + (1− α′

i−1) · αiCi (2.5)

α′

i = α′

i−1 + (1− α′

i−1) · αi (2.6)

Equation (2.6) can then be evaluated by stepping equation (2.5) from i =1 to the number of samples n. This allows for checking if early ray terminationis possible by examining whether the current accumulated opacity is highenough to hide anything that lies behind.

2.3 Graphics Processing Unit

The modern GPU is a highly optimized data-parallel streaming processor.The major innovation in recent years is the programmable pipeline that hasreplaced the traditional fixed-function pipeline. It allows the programmer toupload user-written programs to be executed very efficiently on the GPU.Programs written for the GPU are well suited for implementing object-orderand image-order algorithms for direct volume rendering.

2.3.1 The Graphics Pipeline

Rendering is the process of producing an image from a higher-level descriptionof its components. The GPU efficiently handles as many computations aspossible, this is done by pipelining. For convenience, the graphics pipelinecan be divided into three steps to provide a general overview [2], depicted infigure 2.4.

Background 11

Figure 2.4: The graphics pipeline.

• Geometry Processing computes linear transformations of the inputvertices such as rotation, translation and scaling. Basically it is thestage at which the higher level description is processed into geometricprimitives such as points, lines, triangles and polygons.

• Rasterization decomposes the primitives into fragments, such thateach fragment corresponds to a single pixel on the screen. This step iscommonly known as rasterization.

• Fragment Operations are performed subsequently, the per-fragmentoperations modifies the fragments attributes, such as color and trans-parency.

The resulting raster image contained in the frame buffer can be displayedon the screen or written to a file. The basic graphics pipeline is often referredto as the fixed-function pipeline due to the fact that the programmer has verylittle control over the process [6]. On modern graphic cards it is possibleto control the details of this process by using the so-called programmablegraphics pipeline, which is reviewed in the next section.

2.3.2 The Programmable Graphics Pipeline

True programmability is the major innovation provided by today’s GPU.Programs can be uploaded to the GPU’s memory and executed at the geom-etry stage (vertex shader) and the rasterization unit (fragment shader). Avertex shader is a program that runs on the GPU; it can change propertiessuch as position of each vertex. I.e. it can make a plain grid look bumpy byrandomly translating the vertices in a direction. A fragment shader works ona per-pixel level; it can change properties such as color for each pixel. The

Background 12

terms vertex shader, vertex program, fragment shader and fragment programhave the same meaning respectively [5].

Shader programs are usually written in C-like programming languages,such as Cg or HLSL. Standards are important since different hardware sup-ports different levels of programmability. A widely known standard devel-oped by Microsoft is Shader Models [5]. Limitations have up to now beenprogrammability restrictions. Shader Model 1.1 does not support loops, onlystraight code. Shader Models 2.x introduces the looping ability. Conditionalbranching for dynamical if-else blocks came first with Shader Model 3.0.Shader Model 4.0 introduces geometry shaders for creating vertices on theGPU and unification of shaders, i.e. no more differences between pixel andvertex shaders.

Vertex Shaders

Objects in a 3-D scene are typically described using triangles, which in turnare defined by their vertices. A vertex shader is a graphics processing functionused to add special effects to objects in a 3-D environment by performingmathematical operations on the object’s vertex data. Each vertex can bedefined by many different variables. For instance, a vertex is always definedby its location in a 3-D environment using the x-, y-, and z- coordinates [5].Vertices may also be defined by colors, texture mapping coordinates (e.g.each pixel in an RGB-texture can represent a position by defining the domaincoordinates) and lighting characteristics. Vertex shaders do not change thetype of data, instead they change its values, so that a vertex emerges with adifferent color, different textures or a different position in space.

Fragment Shaders

Fragment shaders create ambiance with materials and surfaces that mimicreality. Material effects replace the artificial organic surfaces. By altering thelighting and surface effects, artists are able to manipulate colors, textures,or shapes and to generate complex, realistic scenes. A fragment shader isa graphics function that calculates effects on a per-pixel basis. Dependingon resolution over 2 million pixels may need to be rendered, lit, shaded, andcolored for each frame, at 30 frames per second [5]. That in turn createsa tremendous computational load. Moreover, applying multiple textures inone pass almost always yield better performance than performing multiplepasses. Multiple passes translate into multiple geometry transformations andmultiple Z-buffer calculations, slowing the overall rendering process [2].

Chapter 3

State-of-the-art Direct VolumeRendering

This chapter presents methods proposed by Ljung et al. [1] for efficient di-rect volume rendering of large data sets on the GPU. Due to the limitedmemory capacity on GPUs techniques for efficient rendering of volume datasets are required. A theoretical background of several methods that optimizethe volume for rendering is described and a more detailed step by step algo-rithm overview is presented in the implementation chapter. The first sectionclarifies why multi-resolution volumes are desirable and how storage spacecan be saved. Furthermore, fundamentals of flat and hierarchical blocking ispresented. The following section focuses on the actual rendering rather thanthe structure of data. A general introduction to volume rendering of multi-resolution volumes is given, going on to issues that can occur when sampling.Two different sampling methods are presented in the sub chapters, NB and IIsampling. These methods are essential for the reader to understand since thefundamentals of this thesis rely upon them. The content of this chapter isbased on the work done by Ljung et al. [1] at NVIS (Norrkoping Visualizationand Interaction Studio), Linkoping University.

3.1 Multi-resolution Volumes

The traditional way of storing medical data sets is to align each slice alongthe z-direction and directly store it in a volume texture. This approach willstore data that might be invisible given a specific TF. For instance, it is notnecessary to store density values that maps to air. By dividing the volumedata into several smaller axis-aligned blocks and classifying which ones arevisible given the TF, we can avoid wasting storage space on invisible data.

13

State-of-the-art Direct Volume Rendering 14

It would also be highly desirable to store regions in different resolutionsdepending on their importance. Homogeneous regions could then be storedin a lower resolution without reducing image quality, while on the other handareas with great variation and with high importance would be stored in fullresolution. Simply skipping invisible blocks might not reduce the volume sizesufficiently, the footprint of the remaining visible blocks may still exceed theavailable texture memory.

The memory capacity is strictly limited on the GPU, which is one ofthe main causes that creates the need for efficient usage of the availabletexture memory. Several methods have been proposed on how to efficientlypack the volume texture and this thesis implements an approach similar toKraus et al [7]. Fundamental aspects of flat and hierarchical multi-resolutionblocking are presented in the following sections and the differences betweenthe schemes are described at some length.

3.1.1 Flat and Hierarchical Blocking

The concepts presented in this thesis are based on flat multi-resolution block-ing, therefore an overview of the blocking scheme is given here. Samples andblock data are located at the center of the uniform grid cells instead of at thecell vertices as in flat blocking. Figure 3.1 shows the placement of samplesin red, grid in black and the blue grid indicates the block grid.

Furthermore, each block is created individually for multi-resolution pre-sentation, either from a wavelet transform or through average downsampling,whereas the latter approach is used in this thesis. Since no global hierarchyis created, this scheme is referred to as flat multi-resolution blocking, or flatblocking. The spatial extent of a block is constant and does not grow withreduced resolution level.

The key advantages of this approach are that uniform addressing schemesis supported and arbitrary resolution differences between neighboring blockscan also be supported, since a block is independent of its neighbors andthe resolution is not restricted to be in powers of two. Flat blocking alsoprovides higher memory efficiency than a corresponding hierarchical scheme,but a disadvantage is that the number of blocks is constant, while hierarchicalschemes scale with reduced memory budget. On the other hand, it is trivialto exploit parallelism in many processing task for flat blocking, since thereare no hierarchical dependencies.

Figure 3.2 compares hierarchical and flat multi-resolution blocking, bluesquares are blocks with constant spatial position and size. The resolution ofeach block is arbitrary and independent of neighbouring blocks. The LODis selected so that a block on the interior should have the second lowest


Figure 3.1: Illustration of a flat multi-resolution blocking grid.

resolution, level 1, while blocks that intersect the boundary of the embeddedobject must have full resolution. Blocks on the exterior are to be ignored,level 0. The LOD selection is indicated with level-specific coloring.

The most common scheme is however the hierarchical scheme. It is cre-ated by recursive downsampling of the original volume, resulting in that eachlower resolution level is 2−3 the size of the previous level. The block size isusually kept equal at each resolution level. Blocking is suitable for hierarchi-cal representations as well, yielding the ability to select different resolutionlevels in different parts of the volume.

3.2 Pre-processing Pipeline

The purpose of the pre-processing pipeline is to reconstruct the data into amore efficient representation for rendering. The first section describes howreorganizing data into blocks can improve the performance for data accessin the memory. The algorithm for creating the packed volume texture isoutlined in the implementation chapter. Blocks with varying resolution willallocate varying amount of memory and these need to be packed tightly inorder to make efficient use of the memory. Furthermore, the creation of meta-data is presented. Meta-data may hold information about the location andresolution of blocks in the volume texture and also minimum and maximumblock values, thus serving as accelerating structures in the rendering pipeline.

3.2.1 Volume Subdivision and Blocking

The block size is usually derived from the size of the CPU’s level 1 and2 caches. Numerous publications indicate that the optimal size for manyblock related processing tasks is blocking by 16 or 32 voxels [8, 9]. The


Figure 3.2: Comparison of hierarchical and flat blocking.

addressing of blocks and sampling within blocks are straightforward for the2-D case and slightly more complicated in 3-D and this problem is addressedin the implementation chapter. Blocking introduces an additional level ofcomplexity for handling of block boundaries, especially when a sample requestis ignored when requested from a neighboring block.

When volume data is passed to the texture manager for the first time thevolume texture is created. The next step is to create downsampled versions ofthe data set that correspond to the LOD. In this implementation the volumedata is subdivided into uniform blocks of 163 voxels and each block is classifiedas either visible or invisible given a preset TF. The spatial dimensions of eachblock are constant and independent of the resolution level at which the blockis selected. Data sets are stored block-wise at all resolution levels, in powersof two, resulting in a 14% data increase.

Creation of meta-data is parallelized with LOD creation. If a block isclassified as visible a LOD is set based on similarities in the histograms forthe block and the TF. Finally, the block is added to the volume texturebuffer. The meta-data, i.e. the positions and resolutions of bricks are heldin a buffer, which is used in a later step when writing this to a texture.The complete algorithm is presented in the implementation chapter. Thedownsampling is done by averaging an eight block neighborhood to a single


voxel, resulting in 2−3 of the size for each downsampled level compared tothe previous level.

The concept is to use the memory efficiently by packing the volume tex-ture tightly and successfully being able to store a 5123 data set in a 2563

volume texture. Skipping voxels that only contain air might not reduce thememory sufficiently, inducing a requirement for the ability to store differentregions in separate resolutions, independent of neighboring blocks.

3.3 Multi-resolution Volume Rendering

Special care needs to be taken when rendering volumes that have blockswith varying resolution. Approaches like octree-based hierarchies have beenpresented, where each brick is rendered separately using texture slicing. Per-formance can be improved by rendering lower resolution blocks with lowersampling density. The opacity, α,must be modified when the sampling den-sity along a ray is changed so that the final result is invariant to the samplingdensity in a homogeneous material. This can generally be expressed as:

αadj = 1− (1− αorig)∆adj/∆orig (3.1)

where ∆adj and ∆orig specify the adjusted and original sampling distancesrespectively [1].

Artifacts may occur at block boundaries when rendering each block sepa-rately. Primarily these artifacts occur due to discontinuities when neighbor-ing blocks are of different resolution level. A general block-based renderingscheme, Adaptive Texture Maps, is presented by Kraus and Ertl [7]. Thefundamental idea of their approach is that an index texture redirects thesampling to a texture holding the packed blocks. This technique supportsrendering of the whole volume instead of block-wise rendering and this ap-proach is also implemented in this thesis.

3.3.1 Sampling of Multi-resolution Volumes

A uniform addressing scheme is provided by the flat multi-resolution blockingstructure. The volume range is defined by the number of blocks along eachdimension. The block index can then be retrieved as the integer part of theposition p, within the volume. The intrablock local coordinate is then definedby the remainder of the position as p

′= frac(p). The block index map holds

the size of each block and the location q of the block in the packed volume.These are then used for computing the coordinate for the sample to take.


(a) Sample boundary (b) Eight block neighborhood

Figure 3.3: Illustration of block neighborhood and boundaries.

Special care has to be taken when sampling in a block since blocks in thepacked volume are rarely neighbors in the spatial domain.

3.3.2 Intrablock Volume Sampling

This approach restricts sampling to only access data completely within thecurrent block. The inset from the blocks spatial boundaries is indicated byδi for block i. The restricted sample location, p

′C , is defined as:

p′

C = Cδ1−δ(p

′) (3.2)

where Cβα(x) clamps the value of x to the interval [α, β]. Figure 3.3a

shows the valid coordinate domain for intrablock samples by squares of red,dashed lines. The sample boundary is defined as the smallest box spanningall samples: for a given block of resolution level l, and a box inset δ(l) fromall edges of the block boundary, equation (3.3).

δ(l) =1

21+l(3.3)

This sampling method is referred to as Nearest Block (NB) sampling byLjung et al [10], meaning that no interpolation between blocks is performed,resulting in artifacts at block boundaries.

3.3.3 Interblock Interpolation

Block artifacts can be overcome using the interblock interpolation techniquedeveloped by Ljung et al [11]. Other approaches suggest sample replication


and padding between blocks. Sample replication counteracts the data reduc-tion and may also distort the sample when a block has higher resolution thanits neighbors.

Interblock interpolation is a scheme for direct interpolation between blocksof arbitrary resolution, and removes the need for sample replication. Theidea is to take samples from each of the closest neighboring blocks using NBsampling and compute a sample value as a normalized weighted sum. Thedomain for interblock interpolation is indicated by the shaded area betweenblock centers in figure 3.3a.

A block neighborhood is illustrated in figure 3.3b. The local coordinatesr,s,t for blocks 1 and 8 centers are (-0.5, -0.5, -0,5) and (0.5, 0.5, 0,5) re-spectively. A sample, ϕb, is taken from each of the blocks using r,s,t asthe intrablock coordinates adjusted with unit offsets specific to each block’slocation relative to the local eight block neighborhood. Using the labelingin figure 3.3b three edge sets Er, Es, Et are introduced for edges of equalorientation (3.4).

Er = {(1, 2), (3, 4), (5, 6), (7, 8)}Es = {(1, 3), (2, 4), (5, 7), (6, 8)}Et = {(1, 5), (2, 6), (3, 7), (4, 8)}

(3.4)

For each edge in the neighborhood, the edge weights ei,j ∈ [0, 1] arecomputed and they determine the block weights, ωb. The sample value iscomputed as a normalized sum of all block samples according to equation(3.5).

ϕ =

∑8b=1 ωbϕb∑8b=1 ωb

(3.5)

Where ϕb is an NB sample from block b and the block weights, ωb, aredefined as:

ω1 = (1− e1,2) · (1− e1,3) · (1− e1,5)

ω2 = e1,2 · (1− e2,4) · (1− e2,6)

ω3 = (1− e3,4) · e1,3 · (1− e3,7)

ω4 = e3,4 · e2,4 · (1− e4,8)

ω5 = (1− e5,6) · (1− e5,7) · e1,5

ω6 = e5,6 · (1− e6,8) · e2,6

ω7 = (1− e7,8) · e5,7 · e3,7

ω8 = e7,8 · e6,8 · e4,8

(3.6)


The contributions of the II sampling technique can be summarized as:

• Provision of high quality rendering without discontinuities caused byblocking.

• Permitting high LOD adaptivity through smooth interpolation betweenarbitrary resolutions.

• Maintaining data reduction rate by avoiding data replication.

• Supporting highly parallel LOD pre-processing, since the method doesnot impose any interblock dependencies.

Chapter 4

Implementation

This chapter presents the practical methods that were implemented for im-proving direct volume rendering. For the reader to fully understand thecontext of this chapter, the theoretical description presented in the previouschapter needs to be understood. The first section presents the working en-vironment and the extent of the existing class hierarchy in the applicationIDS7. Next the pre-processing pipeline is outlined, covering how the volumetexture structure is built from subdividing volume data. Furthermore, it isdescribed how meta-data is used to retrieve the sample position. Finally, theinterblock interpolation technique is described, which helps to avoid sampleartifacts at block boundaries.

4.1 Application Environment

This thesis is implemented in the PACS system IDS7 developed by Sectra-Imtec AB. The language of choice was naturally C# since the rest of theapplication is implemented in it, with DirectX [12] for graphics programming.A high level shading language (HLSL) was used for programs written forexecution on the GPU. The development environment was Microsoft VisualStudio 2005, with .NET 3.0.

4.2 Extending The Class Hierarchy

The current volume rendering technique in IDS7 uses ray casting for directvolume rendering, whereas this thesis will extend the existing framework tosupport multi-resolution representations. The essential changes lie withinhow we pre-process the volume and the texturing. The extension of the classhierarchy is outlined in figure 4.1. The volume render object is the base for

21

Implementation 22

Figure 4.1: The class hierarchy extension.

all volume rendering techniques; we extend this object to a multi-resolutionobject that handles pre-processing of the volume and implements the texturemanager. The ray casting class performs the actual rendering.

4.2.1 Packing The Volume Texture

Introducing a block map structure allows for arbitrary placement of blocksand packing in memory. Invisible blocks given the applied TF are ignored,which results in saved memory. The explanation of generating the packedvolume texture is given for two dimensions since it is likely to be more com-prehensible than a discussion of the general case. Emphasizing that theexample presented here is for illustrational purposes only.

The generation of the volume texture from its initial state is showed infigure 4.2a and the algorithm can be described in the following steps:

1. Build a hierarchy of downsampled versions of the original grid, lettingthe i-th level being of size 2−i Ns×2−i Nt vertices, where Ns, Nt denotesthe dimensions of the index data.

2. Decompose the original grid, i.e. the 0-th level of the hierarchy intons × nt cells of size bs × bt, by letting bs and bt be the maximum blocksize.

3. For each cell in step 2, determine if the data values of the cell are empty.In this case, mark the cell as empty; otherwise determine a scale factorm = 2−i and copy a corresponding data block of size (m(bs− 1) + 1)×(m(bs − 1) + 1) from the data of the i-th level of the grids hierarchy.

4. Build a buffer of data blocks created in the step 3 and append an emptydata block, referenced by an empty cell.

Implementation 23

5. Pack all data blocks, i.e. the data buffer, into a grid of size n′s × n

′t,

which represents the packed data of the volume texture.

The result of the given algorithm can be compared to figure 4.2b. Sofar we have established the transformation from a regular texture with a lotof empty space, allocating unnecessary memory to a tightly packed texture.Although the cells of the coarse grid are of uniform size, the packed datablocks are of different sizes, based on their resolution. Blocks of full resolutioncorrespond to large blocks of the packed data. The corresponding index datain figur 4.2c is further discussed in the following section.

4.2.2 Meta-data

Based on the cell’s references to data blocks established in the previous sec-tion, the scale factor and the positions of the packed data blocks are rep-resented in the index map in figure 4.2c. The first upper level is a coarseuniform grid covering the domain of the texture map. For each cell, thedata consist of a reference to the texture data of the cell and a scaling factorspecifying it’s resolution relative to the maximum resolution. This data isessential for retrieving the sample position in the rendering pipeline, whichis presented in the following section. Gradients for shading are also pre-computed meta-data. On-the-fly gradient computations with several texturelookups can be very expensive even for fragment programs executed on theGPU.

4.2.3 Level-of-Detail Selection

The developed LOD selection technique does not implement any error mea-surement to minimize visual incorrectness. A simple selection is implementedto distribute blocks of different resolutions over the volume only to avoidsharp boundaries in the image. More complex classifications based on his-tograms and TF design are proposed by Ljung et al. [13, 14, 15], howeverthe purpose if this thesis is not to address the LOD selection problem, but tocreate an efficient representation of the volume data. Therefore, no emphasisis put on this in the result and discussion chapters.

Classification of block relevance is based on histogram comparison withthe TF. Binary quantized histograms with 1024 bins are stored for each blockand the TF separately. This simplification tells us only whether we have datarepresented in an interval or not, thus this information is binary.

The final LOD selection is based on how many bins a block has in commonwith the TF. We represent a block with high bin union with a high resolution

Implementation 24

(a) Original texture. (b) Packed texture.

(c) Scale factors and new coordinates.

Figure 4.2: Texture packing and lookup coordinates.

Implementation 25

and vice versa. The assumption of basing the relevance of a block on thenumber of common bins with the TF is in many cases incorrect. Since weonly have binary data, we do not have information about the magnitudein a bin, hence we will under-represent blocks that contain thin peaks thatperhaps only span over a few bins. On the other hand, this classification isvery fast and the meta-data is kept at a minimum.

4.3 Rendering Pipeline

This is the final stage, volume data is rendered to a projection for presen-tation on the user’s screen. The principle has already been presented, raysbeing generated from the viewpoint via a pixel in the image and through thevolume, defined by the integral in equation (2.1). Several approaches exist,with varying trade-off between performance and render quality. The follow-ing sections will present the implementation of NB sampling and II proposedby Ljung et al., which has been integrated in IDS7.

4.3.1 GPU-based Ray Casting

Modern GPUs can be programmed using shader programs. This programma-bility allows for the use of more complex sampling and shading operations inthe volume rendering, and a more direct implementation of the ray integralcan be made [16].

The pixel shader that performs volume rendering uses conditional breaks,requiring Shader Model 3.0, which also simplifies early ray termination. Usinglinear interpolation between samples that practically comes free with theGPU, and with no respect to II, the ray casting algorithm can be expressedas follows:

for (int i = 0; i < 255 && !break; i++) {

for (int j = 0;j < 255 && !break; j++) {

# Compute sample position along the ray

# Lookup new sample position in index texture

# Classify the sample as visible/invisible

# Compute gradients, on-the-fly or by lookup

# Front to back compositing

# Increase position along ray

# Conditional break, if opacity is high enough

Or if position is beyond the exit-point

}

}

Implementation 26

The main loop consists of two nested loops due to the 255 loop-limit inShader Model 3 but also to be able to conditionally break the looping ifconditions for breaking are fulfilled. A lookup is needed to find out wherein the packed volume texture the corresponding sample is stored. How thisposition is defined exactly is presented in the following section. Also withinthe loop the contribution of the current sample is calculated and then addedto earlier samples using front-to-back compositing.

4.3.2 Sampling Algorithm

A texture lookup in a packed texture is performed in several steps (see fig-ure 4.2), these are:

1. Determine which cell in the index data that includes the sample point(s, t).

2. Compute the coordinates (s0, t0) corresponding to the cell origin.

3. Lookup of the index data for the cell, i.e. the scale factor m and theorigin (s

′0, t

′0) of the data block in the packed data.

4. Compute the coordinates (s′, t

′) in the packed data corresponding to

(s, t) in the index data.

5. Finally, lookup the actual texture data at (s′, t

′) in the packed data.

The origin in texture coordinates (s0, t0) of the cell including the point(s, t) may be computed by:

s0 =bs nscns

and t0 =bt ntcnt

(4.1)

where the floor function bxc gives the largest integer less than or equal tox. The scale factor m and the origin (s

′0, t

′0) of the corresponding packed data

block are given as functions of (s0, t0), thus the texture coordinates (s′, t

′) in

the packed data may be computed by:

s′= s

′

0 + (s− s0)m (4.2)

t′= t

′

0 + (t− t0)m (4.3)

Finally, the computation of local block texture coordinates u, v, w areshown for u by equation (4.4), ensuring that samples are not taken outsidethe boundary.

Implementation 27

u = C1−δδ (u

′) (4.4)

where Cβα(γ) clamps the value δ to the [α, β] interval.

4.3.3 Interpolation Algorithm

The task of interblock interpolation is to retrieve a sample value for a positionbetween sample boundaries of neighboring blocks. The overall structure fordoing so can be summarized in following steps:

1. Determine the eight block neighborhood and setup a local coordinatesystem.

2. Take samples from each block using the intrablock sample method de-scribed in the previous section.

3. Compute edge weights between side-facing neighbors.

4. Compute block weights from three edge weights.

5. Compute a normalized weighted sum, yielding the final sample value.

This thesis implements the maximum distance approach of the describedinterblock interpolation technique presented by Ljung et al [11]. Other vari-ations are minimum distance and boundary split, where the differences liewithin how edge weights are computed. The maximum distance methodavoids discontinuity in the derivate of the interpolated weight within theinterval. The interpolated value can generally be expressed as:

ei,j(ρ) = C10((ρ+ δi)/(δi + δj)) (4.5)

where the value is interpolated over the whole distance between neighbor-ing sample boundaries. When all the blocks have the same resolution, thisinterpolation is equivalent to trilinear interpolation, i.e. let δi = δj = δ, thenequation (4.5) will result in:

ei,j(ρ) = C10(0.5 + ρ/2δ) (4.6)

which is a linear interpolation kernel, yielding that artifacts will not occurwhen sampling within the block boundaries. Finally, the sample value iscomputed as a normalized sum of all block samples according to equation(3.5).

Chapter 5

Results

This chapter presents the achieved results of the project, mainly in the senseof frame rates, but also from a time perspective regarding the time it takesto pre-process the volume. The first section describes the circumstances thethesis is implemented in. Furthermore, the result from each step of the systempipeline is presented in sections separately.

5.1 Data Sets and Test Environment

This thesis was implemented and tested in a 3-D module that partially con-stitutes the PACS software IDS7. By working in this stand-alone module thedevelopment process became more efficient, since compiling the entire solu-tion could be avoided. In the same time, assertions could be made that theimplemented methods worked in the entire solution as well, since the moduleis used as it is in IDS7.

The test data is medical volumes captured with modalities from real worldexamination cases and are presented by courtesy of Sectra-Imtec AB andCMIV (Centre for Medical Imaging and Visualization).

Two major test cases have been used. The first examination is a CTscanned data set of the upper body from a trauma case. The primary objec-tive of this data set is to examine how much storage space we can save byavoiding storing invisible blocks given the applied TF. This data set is wellsuited for this purpose, since it contains lots of air. The other data set is ahead scan from an accident also produced by a CT scanner. The objectiveof this examination is to study how block resolutions need to be distributedto fit the data set into the GPU’s texture memory.

28

Results 29

Dataset Size Read Volume Meta-data Pack Total RatioTrauma 5123 6,7 14,9 4,9 26,5 31:1Accident 5123 8,6 18,9 5,8 33,3 29:1Trauma 2563 0,8 1,9 0,6 3,3 9:1Accident 2563 1,0 2,1 0,8 3,9 4:1

Table 5.1: Time performance measured in seconds and reduction ratio.

5.2 Volume Pre-Processing

The major disadvantage of multi-resolution representations is the processingtime to create the meta-data and the memory it allocates. The preparationfor rendering basically consists of these three steps: reading the volume,generating meta-data and packing the volume texture. Fortunately, the mosttime expensive process, generating meta-data, only needs to be done oncefor each volume. The second time we can read the meta-data from file. Forinstance, stored meta-data is central difference gradients for shading, theseneed to be computed for all LODs and downsampled versions of the rawvolume data are stored as well.

To avoid recreating meta-data that can be read from file a fingerprintis calculated for each volume i.e. when reading a volume we first calculatethe fingerprint and match it against stored fingerprints, if the volume hasbeen processed before we can skip the meta-data generating step and loadit directly from file. Table 5.1 shows the time performance. Doubling eachdimension, i.e. 2563 to 5123 gives 23 times the previous data, yielding that wewill have 23 times more blocks to process. This calls for efficient algorithms,suggestions are discussed in the next chapter.

Results 30

Dataset Size B0 B1 B2 B3 NB IITrauma 5123 3945 961 1307 1 40,1 7,1Accident 5123 3772 1971 273 1805 37,5 6,8Trauma 2563 823 556 205 643 41,2 8,9Accident 2563 860 268 31 233 38,2 7,3

Table 5.2: FPS for II and NB on GeForce 8800 GTX and block count

Dataset Size B0 B1 B2 B3 NB IITrauma 5123 3945 961 1307 1 24,2 8,1Accident 5123 3772 1971 273 1805 18,1 6,7Trauma 2563 823 556 205 643 24,7 8,8Accident 2563 860 268 31 233 18,4 7,2

Table 5.3: FPS for II and NB on ATI X1950 Pro and block count

5.3 Nearest Block Sampling

The quality of the NB sampling technique is shown in figure 5.1. Performancevaries somehow between NVIDIA and ATI GPUs, as shown in table 5.2and 5.3 respectively. The performance is measured in a viewport of 512×512pixels and in full zoom, meaning that the volume projection populates theentire viewport. A factor worth considering is that the X1950 Pro modelfrom ATI uses the AGP slot, while NVIDIA’s GeForce 8800 GTX uses thePCI-Express slot. Furthermore, level 0 corresponds to the highest resolution,having the size of 163 voxels per block and level 3 is the lowest resolutionwith each block being the size of 23 voxels. This means that it is possible tostore 16 blocks in each x-, y- and z-direction for the highest resolution andcorresponding 128 blocks for the lowest resolution, assuming that we have avolume texture of 2563 voxels for our disposal.

The image quality is highly dependent of how LOD is selected for eachblock. The artifacts that occur when clamping from boundaries become morevisible if the LOD is selected poorly. Therefore, it is desirable that as manyblocks as possible are selected at the highest resolution.

Results 31

(a) NB Sampling (b) SW Based

(c) NB Sampling (d) SW Based

Figure 5.1: Comparison of 512x512x512 data sets with NB sampling.

Results 32

5.4 Interblock Interpolation

The quality of the II sampling technique is shown in figure 5.2. Comparingthe results from II with the software based version, the wood-grained artefactsseems to be less, almost non-existing. It is worth considering if the softwarebased ray casting and the NB technique can be fully eliminated, since itseems possible to produce full quality in real-time on the GPU. However,when navigating close-up to the volume the computations increase rapidlyand the rendering time becomes suffering. Due to many texture lookups andlonger fragment programs the frame rate drops when using II. Although theempty space skipping helps us to save a lot of unnecessary computations, thealgorithm still is inefficient to fully replace the NB sampling during interac-tion, but the software-based version is fully replaceable.

It is important to consider the difference between the error introducedby varying LOD and the error introduced by the NB sampling, this thesisfocuses on the quality of the II scheme. Comparison between NB, II and SWis presented in figure 5.3. To enhance the gain in image quality by II, allblock resolutions are set to level 2, creating clearly visible ringing-artifacts.This extreme case is for illustrational purpose, using only 2% of the 2563

volume texture for NB and II, compared to full resolution SW rendering ofa 5123 data set. Comparing the reduction ratio, 341:1 in this case, to theratios in table 5.1 further clarifies the gain in image quality by II, artifactsare reduced to a minimum even though the extreme reduction ratio.

Finally, figure 5.4 shows the image quality gain for a 5123 data set by theimplemented multi-resolution framework compared to the traditional volumerendering technique in IDS7. Figure 5.4a shows the quality of the traditionalrendering pipeline and figure 5.4b shows the quality of the implemented meth-ods, compared to figure 5.4c, which is the full resolution SW rendering. Itbecomes obvious that multi-resolution volume rendering on the GPU is thesuperior technique, since there is almost no visual difference between fig-ure 5.4b and 5.4c.

Results 33

(a) II Sampling (b) SW Based

(c) II Sampling (d) SW Based

Figure 5.2: Comparison of 512x512x512 data sets with II sampling.

Results 34

(a) NB Sampling (b) II Sampling (c) SW Full Resolution

Figure 5.3: 2% texture usage for II and NB, and 100% for SW.

(a) Single-resolution (b) Multi-resolution (c) Full Resolution

Figure 5.4: Comparison of single- and multi-resolution rendering.

Chapter 6

Discussion

This chapter will conclude the thesis by discussing the achieved result in thesense of image quality trade-off in relation to the processing time and howwell the presented methods integrate in IDS7. Moreover, ideas of future worki.e. how to continue developing the proposed approach is discussed. Personalremarks regarding the overall project is discussed as well.

6.1 Conclusions

Both I and my supervisor were very pleased with the implementation ofthe GPU-based multi-resolution direct volume rendering technique. Shaderprogramming is still a relatively new field, which often gave rise to newchallenges during the implementation. As an example a bug was reportedMicrosoft about an error in the HLSL-compiler that can occur in the op-eration reorganizer. The bug will be fixed in the March 2008 release sincethe November release was in a too soon future when reporting the bug. Thefirst hand knowledge about shader programming is still very restricted andthe documentation is poor. With that said, the solution itself as presentedin this thesis may seem more straightforward than it in fact is. The lack ofeffective debugging tools clearly slows down the development process.

A great challenge during the project was to find solutions that fulfilledthe specific requirements of the IDS7 environment. No papers to my dis-posal included any managed code implementations. To find solutions thatfit the specific demands of the Sectra environment required a lot of tediousexperimentation to find workarounds to various problems.

In the following sections the implementation of the proposed methods byLjung et al. [1] are further evaluated and compared to the previous volumerendering technique in IDS7. Finally, future improvements are presented.

35

Discussion 36

6.1.1 Related Work

Most of the research made in this area focus on accelerated structures forvolume rendering and many papers do not address the image quality trade-off that needs to be done. In medical visualization the visual correctness iscrucial to making a correct diagnosis. On the other hand, there are limita-tions in memory capacity when using the GPU that counteracts the physicalcorrectness. Real-time rendering is important for instant feedback from thesystem; the tolerance level from the user is very low on this matter. To fulfillboth requirements, a more efficient representation is needed, which is thegroundwork for this thesis.

Ljung et al. [1] presents several approaches for efficient volume render-ing. Both pre-processing and real-time pipelines are outlined. Several widelyknown issues are addressed and dealt with. The presented technique for in-terpolating between blocks of arbitrary resolutions is unique and the resultis convincing. Data management problems are dealt with at some length.Storage space is still an issue, writing and reading data from disk requireshigh performance discs. For instance, a 2563 data set will allocate 2 bytesper pixel data if stored in an ushort and 4 bytes more per pixel for RGBAgradients, and this needs to be stored for all LODs, resulting in 77 Mb meta-data. This can easily be held in the primary memory, but for a 5123 dataset we will already have 630 Mb of meta-data, hence we will get an out ofmemory exception soon if the data set’s size increases further. Therefore, Iwould like to emphasize the importance of good data management.

6.1.2 Implemented Methods

The implemented methods works very well integrated in IDS7. The trade-offbetween pre-processing time and data reduction ratio is convincing, keepingin mind that the pre-processing step only needs to be done once for eachvolume. An official representation format is still not fully developed forIDS7, since this thesis only investigates the possibility to use multi-resolutionrepresentations.

Comparing performance between different solutions is not an easy task.The performance of different implementations is often quantified in papersas frame rates for different data sets. This is hard to interpret since hard-ware may be different and no standard setup exists for performing tests. Forinstance, the performance from a frame rate point of view varies a lot de-pending on the viewport and zoom. Relative performance measures do nottell much about the general case.

Fortunately, in this project, the earlier version of the GPU-based ray

Discussion 37

casting algorithm could be used as a guideline for performance. The multi-resolution ray casting algorithm reached close to the same frame rate, butduring interaction simplifications needs to be made. I truly believe thatmulti-resolution representations on the next generation GPUs is the bestchoice so far because of its superior image quality.

Maintaining image quality by making the best possible selection of datareduction is a challenge. The obtained reduction ratios for the test cases bythe LOD selection algorithm presented in this thesis is shown in table 5.1.Emphasizing the artificial intelligence of the algorithm the data sets of 2563

voxels has been reduced even though it is possible to store all blocks at thehighest resolution. Conclusions can be drawn that data sets containing lotsof air can be pushed harder by the LOD selection algorithm to reduce thedata, while on the other hand important information can be reduced by thesimple block importance measurement criteria.

6.2 Future Work

There is lots of room for improvement, even if the proposed methods workwell. For instance, the generation of the meta-data still consume lots ofmemory. Really large data sets will not fit into the internal memory, thenswapping data against disk storage is required to avoid out of memory excep-tions. A workaround for this is to use a 64-bit operating system, on hardwarethat supports the architecture, instead of the traditional 32-bits systems. Theswapping against disk then comes naturally. However, this option was notvacant, since the 64-bit version of IDS7 still not is ready for deployment.

Moreover, from an architectural design aspect, there are several interest-ing problems. For instance, once the volume texture is packed given a presetTF we will need to re-pack the texture if the user chooses to alter the TF.This is very likely to happen and when it does we need to do this in a fastand efficient way. One approach is to have flags that tell us whether thevisibility of a block is changed and then only handle these blocks. If we werenot able to fit the whole data set in the internal memory an efficient struc-ture is needed for fast access to quickly update the texture. An idea is tohave octree-like structures with groups of blocks to avoid looping through allthe blocks. Multi-core CPUs allows for more time efficient packing of dataand generation of meta-data, tasks can be handled on several CPU coresseparately.

How LOD is selected for different regions of interest is very important,this thesis has presented a naıve knowledge-based approach. Automating thisstep and making more qualified classifications is highly desirable but a very

Discussion 38

complex task. Future implementations should focus on methods proposedby Ljung et al. in [13, 14, 15]. The visual error needs to be measured in arepresentative domain and minimized to avoid artifacts in the final renderedimage.

However, the proposed improvements i.e. data management and LODselection, are both areas of research large enough to easily be addressed intwo separate thesis projects. There is neither room for it nor time for me toaddress these problems in this thesis. Therefore, these interesting problemsare left for future development.

Bibliography

[1] Patric Ljung. Efficient methods for direct volume rendering of largedata sets. PhD Thesis. Linkoping studies in science and technology.Dissertations no. 1043. ISBN 91-85523-05-4, ISSN 0345-7524., 2006.

[2] Markus Hadwiger and Christof Rezk Salama. Real-time volume graph-ics: Introduction. Course Notes 28, SIGGRAPH ’04, 2004.

[3] Anders Persson Claes Lundstrom Patric Ljung, Calle Winskog and An-ders Ynnerman. Full body virtual autopsies using a state-of-the-artvolume rendering pipeline. IEEE Transactions on Visualization andComputer Graphics, 2006.

[4] Marc Levoy. Volume rendering: Display of surfaces from volume data.IEEE Computer Graphics and Applications, 1998.

[5] Real-Time Volume Graphics. A K Peters. Ltd., 2006.

[6] Joe M. Kniss Aaron E. Lefohn Christof Rezk Salama Klaus Engel,Markis Hadwiger and Daniel Weizkopf. Real-time volume graphics.Course Notes 28, SIGGRAPH ’04, 2004.

[7] Martin Kraus and Thomas Ertl. Texture mapping: Adaptive texturemaps. Proceedings of the ACM SIGGRAPH/EUROGRAPHICS confer-ence on Graphics hardware HWWS ’02, 2002.

[8] Kantisar A. Groller E. Grimm S., Bruckner S. Memory efficient acceler-ation structures and techniques for cpu-based volume raycasting of largedata. Proceedings IEEE Volume Visualization and Graphics Symposium,2004.

[9] Kantisar A. Groller E. Grimm S., Bruckner S. A refined data addressingand processing scheme to accelerate volume raycasting. Computers andGraphics 28, 2004.

39

BIBLIOGRAPHY 40

[10] Patric Ljung. Adaptive sampling in single pass, gpu-based raycastingof multiresolution volumes. Proceedings of Eurographics/IEEE Interna-tional Workshop on Volume Graphics, 2006.

[11] Claes Lundstrom Patric Ljung and Anders Ynnerman. Multiresolutioninterblock interpolation in direct volume rendering. Proceedings of Eu-rographics/IEEE Symposium on Visualization, 2006.

[12] Tom Miller. Managed directx 9 - graphics and game programming. ISBN0-672-32596-9, 2003.

[13] Anders Ynnerman Patric Ljung, Claes Lundstrom and Ken Museth.Transfer function based adaptive decompresion for volume rendering oflarge medical data sets. Proceedings IEEE/ACM Symposium VolumeVisualization, 2004.

[14] Patric Ljung Claes Lundstrom and Anders Ynnerman. Extending andsimplifying transfer function design in medical volume rendering usinglocal histograms. Proceedings EuroGraphics/IEEE Symposium on Vol-ume Visualization, 2005.

[15] Patric Ljung Claes Lundstrom, Anders Ynnerman and Hans Knutsson.The α-histogram: Using spatial coherence to enhance histograms andtransfer function design. Proceedings EuroGraphics/IEEE Symposiumon Volume Visualization, 2006.

[16] Jens Kruger and Rudiger Westermann. Acceleration techniques for gpu-based volume rendering. IEEE Visualization, 2003.

Documents

Multi-Resolution Volume Rendering of Large Medical Data ...liu.diva-portal.org/smash/get/diva2:17428/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik