Next Generation CAT Systemalumni.media.mit.edu/~jaewonk/Publications/Thesis... · Abstract Since the ﬁrst CAT system has been introduced in 1971, the main form using scanning of

Next Generation CAT System

Jaewon Kim∗

MIT Media Lab

Advisor: Ramesh Raskar

Associate Professor of Media Arts & SciencesMIT Media Lab

Reader: V. Michael Bove

Principal Research Scientist of Media Arts & SciencesMIT Media Lab

Reader: Fredo Durand

Associate Professor of EECSMIT CSAIL

Reader: Yasuhiro Mukaigawa

Associate Professor of Osaka University

Table Of Contents

1 Introduction 31.1 Contributions . . . . . . . . . . . . . . . . . . . . 31.2 Related Work . . . . . . . . . . . . . . . . . . . . 3

2 Proposed Work 42.1 4D light field . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 A mask-based capturing method of 4Dlight field . . . . . . . . . . . . . . . . . . 4

2.1.2 Decoding process of 4D light field image . . 42.2 Volumetric Reconstruction using ART . . . . . . . 52.3 Getting a clear image through a scattering media . . 5

2.3.1 Image Acquisition using a Pinhole Array . . 52.3.2 Direct-Global Separation via Angular Fil-

tering . . . . . . . . . . . . . . . . . . . . 6

3 Evaluation 6

4 Required resources 7

5 Timeline 7

∗e-mail: [email protected]

Abstract

Since the first CAT system has been introduced in 1971, the mainform using scanning of an X-ray source hasn’t been changed. Suchscanning mechanism needs a huge system and long exposure time.In this thesis work, future CAT system concept will be proposed byapproaching novel techniques which allow wearable and real-timeCAT machine. At the first step, high speed CAT system withoutscanning will be implemented by 4D light field capturing technique.Next, a tomographic system using visible light instead of X-raywill be explored to develop a harmless and wearable CAT machine.In experimental setup, translucent objects and visible light sourceswill be used to imitate the effect of using X-ray sources.

1 Introduction

Generally CAT system is used to get 3D inner view of a humanbody for medical purpose. CAT system has adopted X-ray scan-ning method for forty years and has been recognized as a huge,slow and harmful system. We often feel needs to monitor or checkinside of our bodies for health condition or exploit inner shape ofour body for various purposes like biometric information. This the-sis will address techniques to accomplish such desires in our dailylives. The first goal is implementing a scan-free tomographic sys-tem. Such system is very important in terms of realizing a compactand very fast CAT system. The second goal is developing a wear-able CAT system which allow easy access to CAT system throughour common lives. For these goals, 4D light field capturing tech-nique will be explored into CAT system. Current CAT system takesmultiple images at different locations of X-ray source by scanningit rotationally. In such process, scanning X-ray source and acquir-ing multiple images are main factros to make the system huge andslow. To eliminate this process, we propose to use 4D light fieldcapturing technique by a single image. 4D light field is defined by2D spatial and 2D angular information of light. Former researchersproved that a lenslet or pinhole array can be used to capture 4D lightfield by a single shot image. If we apply such technique to CATsystem, it is possible to contain multiple images taken at differentpositions of X-ray source onto a single image. Thus, we can storeall different images from the X-ray sources onto a single shot im-age by putting multiple X-ray sources at different positions. Thus,a scan-free and instantaneous CAT system can be implemented bythis technique. Another goal we want to explore is acquiring a clearimage for inside view of human body with harmless light sources.Many researchers have presented various methods for such purposein DOT(Diffuse Optical Tomography) field. Mostly, they have usedNIR(Near Infrared LED) sources to view inside of certain parts in ahuman body but it is still difficult to get clear images for those partswith such harmless light sources. I will propose a new method toview clearly inside of human body or generally scattering mediawith harmless LED sources. In this method, we will try to sepa-rate light transmitted through scattering media or human skin intodirect and scattered components. Then, we will generate a recon-structed image using only direct component rays which will givemuch sharper images than a normal image with scattered compo-nent rays. This thesis work is expected to contribute to variousfields like medical imaging, handy body viewer and new biomet-rics.

1.1 Contributions

This thesis will address novel techniques to realize next generationCAT system which is wearable, fast and harmless. The primarytechnical contributions are as follows.

• This thesis will approach a single shot CAT technique which

will generate real-time 3D inner view of human body in scan-free system. By this technique, it would be possible to moni-tor the movement of organs like a heart in human body. Also,wearable and portable system will be realized with improveddiffuse optical technique which is another main topic of thisthesis.

• This thesis will develop a method for single-exposure separa-tion of direct and global components of scattered, transmittedlight using a pinhole or lenslet array placed closed to the im-age sensor. In the direct-only image, high-frequency detailsare restored and provide strong edge cues for scattering ob-jects. We note that, due to its single-shot nature, this methodcan also be applied to dynamic scenes.

• This thesis will demonstrate enhanced volumetric reconstruc-tion of scattering objects using direct component images.These separation methods are well-suited for applications inmedical imaging, providing an internal view of scatteringobjects such as human skin using visible-wavelength lightsources (rather than X-rays).

1.2 Related Work

Light Field Capturing: The concept of capturing 4D light fieldwas presented by Levoy[Levoy and Hanrahan 1996] and Gortler[GORTLER et al. 1996]. Also, Isaksen[ISAKSEN et al. 2000]described a practical method to compute 4D light field. Themethod of capturing 4D light field has been developed into varioustypes by Levoy[Levoy et al. 2004] and Vaish[VAISH et al. 2004].They presented the method to capture light field using micro lensarrays and camera arrays. The method using camera array needhuge system and lens array method has aberration problem bymany lenses. Recently, Veeraraghavan[Veeraraghavan et al. 2007]presented a simple way to capture 4D light field by a 2D thin mask.

Shield Field Imaging: Douglas[Lanman et al. 2008] showed away to reconstruct 3D shape of an occluder by a single shot. Inhis research, many LEDs cast silhouettes at different directionsonto a screen. The silhouettes are coded by a mask inside ofscreen and captured with a single exposure. From the capturedimage, each silhouette casted by each LED source are decoded intolow resolution. By combining the images, 3D outer shape of theoccluder is reconstructed.

Direct-Global Separation: Direct-global separation of lightis widely studied in diverse fields spanning computer vision,graphics, optics, and physics. Due to the complexities of scattering,reflection, and refraction, analytical methods do not achievesatisfactory results in practical situations. In computer vision andgraphics, Nayar [Nayar et al. 2006] present an effective method toseparate direct and global components from a scene by projectinga sequence of high-frequency patterns. Their work is one of thefirst to handle arbitrary natural scenes. However, their solutionrequires temporally-multiplexed illumination, limited the utilityfor dynamic scenes. Nasu [Nasu et al. 2007] present an acceleratedmethod using a sequence of three patterns. In addition, Rosen andAbookasis [Rosen and Abookasis 2004] present a descatteringmethod using a microlens array.

Tomographic Reconstruction: Trifonov [Trifonov et al.2006] consider volumetric reconstruction of transparent objectsusing tomography. Such an approach has the advantage of avoidingocclusion problems by imaging in a circular arc about the object. Inthis paper, the 3-D shapes of transparent objects were reconstructedin high-resolution using a limited baseline (rather than a full 360degree turntable sequence). In our paper, we reconstructed the 3-D

Figure 1: light field parameterization in 1D schematic diagram ofimaging setup

Figure 2: Small section of the 150×100 pinhole array mask usedin our prototype, with a pinhole diameter of 428 microns.

shape of scattering objects using only eight direct-only imagesusing the well-established algebraic reconstruction technique(ART).

2 Proposed Work

2.1 4D light field

Figure 1 defines 2D light field coordinate. A ray from a point oflighting plane is projected to a point on sensor in the spatial param-eter, x, and the angular parameter, θ . In real world, the system willhave four parameters, x, y, θ and φ which define 4D light field.

Figure 3: An Inset image focused on diffuser in Figure 2

2.1.1 A mask-based capturing method of 4D light field

Figure 1 explains a process to capture 2D light filed. The rays fromLEDs are projected to a diffuser through translucent media afterbeing modulated by a mask. In this figure, the mask plays a roleto transform 2D light field, x and θ, to 1D spatical coordinate, x’.The transformed 1D signals are sensed by a camera. In real world,a mask transforms 4D light field, 2D spatial and 2D angular infor-mation of light, to 2D spatial information. A mask is a printed 2Dpattern array on a thin and transparent film. There are various kindsof masks for capturing 4D light field and a pinhole array mask inFigure 2 is currently being used. Figure 3 shows a small part ofan image captured by actual system with 6x6 LED arrays. The redbox area contains 2D angular information at a spatial point and eachwhite tile in the red box gives a ray intensity value at a specific an-gular and spatial domain. In this figure, angular resolution is 6x6which is same with the number of LEDs in lighting plane. Fig-ure 4 shows current experiment setup which exactly match with the

Figure 4: Current experiment setup

Figure 5: Coded image of 4D light field penetrating a wine glass(Image Resolution: 3872x2592)

schematic diagram in Figure 1. 6x6 LEDs in lighting plane generatedifferent 2D projection image of an object onto a screen. Figure 5is a captured 4D light field image of a wine glass in this scheme.

2.1.2 Decoding process of 4D light field image

In decoding process, each 2D spatial image is generated accord-ing to each angular division. In our scheme geometric positions oflighting, a diffuser and a mask planes are carefully chosen in thecondition that angular resolusion is same with the number of lightsources. Thus, the number of images generated by decoding pro-cess is same with the number of LEDs in light plane and each one isan image projected by each LED. Figure 11 explains the decodingprocess. Pixels at same angular region are collected and generate a2D spatial image. By repeating this process, we will get N imageswhich is same with the number of LEDs. The resolution of eachdecoded image is number of pinholes in the mask when a pinholearray mask is used.

Figure 6: Decoding process of a coded 4D light field image

Figure 7 shows decoded image set, 4x4, by this way. The reso-lutions of the original image in Figure 5 and a decoded image inFigure 7 are 3872x2592 and 150x100, respectively. Each decodedimage gives different angular view of the object and consequentlyit proves that the multiple views of any object can be obtained in-staneously by a single shot image using thie method.

Figure 7: 16 Decoded images from a coded light field image of awine glass and a straw(the resolution of each image is 150x100)

2.2 Volumetric Reconstruction using ART

I = I0exp(−N∑

i=1

aifi) (1)

We will use an algebraic reconstruction technique (ART) presentedby Roh [Roh et al. 2004] to reconstruct 3-D shape of scatteringobjects following tradition short-baseline tomography approaches.Generally, when a ray passes through an object, the change in in-tensity can be modeled by Equation (1). I0 is the original intensityof the ray and I is the resultant intensity after penetrating N layersinside the object. In this equation, ai means the distance penetrat-ing at ith material of which absorption coefficient is fi as depictedin Figure 8. Equation (2) is the logarithmic expression of Equa-tion (1). Note that Equation (2) can be rewritten for the jth ray asfollows.

h = log(I0/I) =

N∑i=1

aifi (2)

hj(t) =

N∑i=1

ajifi (3)

Figure 8: Projection model of a ray.

Now, our problem is finding fi values which correspond to the den-sity information within the reconstruction region. Equation (3) canbe described using Equation (4) in matrix form. A matrix representsthe projective geometry of rays calculated for the emitting positionof rays, and the received position for a predetermined reconstruc-tion region in which the object is assumed to be placed. The vectorh is related to the sensed intensity values. In practice, our imple-mentation of ART takes approximately five minutes for our datasets.

h = AF (4)

where A =

a11 a1

2 · · · a1N

a21 a2

2 · · · a2N

...... · · ·

...aM1 aM

2 · · · aMN

=

a1

a2

...aM

F ∈ RN , h ∈ RM ,A ∈ RM×N

We note that Equation (5) can be used to get the next step value,fi(t+1), from the parameters at the current ith step. In this equation,t and λ are the step index and a coefficient related with convergenceparameter, respectively. The values of g and h are the measuredvalue from sensing and the calculation value from Equation (4) us-ing f at the current step. As the iteration step, t, increases, the errorterm, gj-hj(t), decreases and fi(t) gets closer to the exact value. Fi-nally, we can get approximated reconstruction result, f. Figure 9and Figure 10 show 3D reconstruction results of two objects bythis ART method using 8 images taken at different viewing angles.Thus, by combining the 4D light field capturing technique in previ-ous section and this ART method, we expect that the whole shapeof any translucent objects can be reconstructed instantaneously withmultiple light sources. Also, it will allow a scan-free and fast CATsystem when multiple X-ray sources are applied.

fi(t+ 1) = fi(t) + λgj − hj(t)∑N

i (aji )

2aj

i (5)

(a) Captured images at 8 different viewing angles

(b) 3D reconstrction results by ART method

Figure 9: Tomographic reconstruction for a dog-shape objects.

2.3 Getting a clear image through a scattering media

2.3.1 Image Acquisition using a Pinhole Array

In an imaging setup like Figure 11 rays emitted from a light sourceare scattered through a scattering media and the original and scat-tered rays are projected to a screen. The original rays emited froma light source are called direct component rays and additional raysthrough scattering media are called global component rays. When

(a) Captured images at 8 different viewing angles

(b) 3D reconstrction results by ART method

Figure 10: Tomographic reconstruction for a wine glass.

there is an object inside of the scattering media, it looks blur be-cause the global component rays which has very low frequency areoverlappted with direct componentes rays at sensing. Thus, it is eas-ily inferred that a clear image for the inside object can be obtainedby separating global components out from the sensed image. A pin-hole or lenslet array mask can be applied for this purpose when itis placed in front of a diffuser as shown in Figure 11. Figure 12shows how the direct and global rays are formed through the pin-hole or lenslet array. In the image formation, there exist two distinctregions, mixed region of the two components and pure global raysregion. We will try to separate scattered components from sensedvalues at the mixed region by fitting global component values atthe pure global region under each pinhole. As shown in Figure 13,the diffuser-plane image consists of a set of sharp peaks under eachpinhole in the absence of any scattering media between the lightsource and diffuser. As shown on the right, the pinhole images ex-tended, blurred patterns when a scattering objects is placed betweenthe light source and camera; ulimately, the globally-scattered lightcauses samples to appear in mixed neighboring the central pixelunder each pinhole. This blurring of the received image would beimpossible to separate without the angular samples contributed bythe pinhole array mask. As shown in Figure 13, the angular sam-ple directly under each pinhole can be used to estimate a directplus global transmission along the ray between a given pixel andthe light source. Similarly, any non-zero neighboring pixels can befully attributed to global illumination due to volumetric scattering.From Figure 13, we infer there are two regions in image under eachpinhole; the first region consists of a mixed signal due to cross-talkbetween the direct and global components. The second region rep-resents a pure global component. In the following section, we showa simple method for analyzing such imagery to estimate separatedirect and global components for multiple-scattering media.

2.3.2 Direct-Global Separation via Angular Filtering

In this section we consider direct-global separation for a 1-D sen-sor and a 2-D scene, while the results can be trivially extended to2-D sensors and 3-D volumes. As shown in the second one of Fig-

Figure 11: Diagram of capture setup. A diffuser is used to forman image through a pinhole array mask. A high-resolution cameracaptures the array of pinhole images in a single exposure.

Figure 12: Image formation model for a multiple-scattering sceneusing a single pinhole. Note that the directly-transmitted ray im-pinges on a compact region below the pinhole, yet mixes with scat-tered global rays. The received signal located away from the direct-only peak is due to scattered rays.

ure 14, a single pinhole image is defined as two separate regions,a pure global component region and a region of mixed direct andglobal components. We represent the received intensity at eachdiffuser-plane pixel as, {L0, L1, . . . , Ln}, when a scattering objectis placed between the light source and the diffuser. The individualsensor values are modeled as

L0 = G0 +D0

...Ln = Gn +Dn,

(6)

where {Gn} and {Dn} represent the underlying global and directintensities measured in the sensor plane, respectively. As shownin Figure 14, a straightforward algorithm can be used to estimatethe direct and global components received at each pinhole. First,we estimate a quadratic polynomial fit to the values outside thenon-zero region of a pinhole image obtained with no scattering ob-jects presents (in our system, the central region of 7×7 pixels isexcluded). Note that in this region, Li ≈ Gi. Afterwards, thepolynomial model can be used to approximate values of the globalcomponents {Gn} in the region directly below each pinhole; notethat this region is subject to mixing and the global component mustbe approximated from the global-only region. Finally, a direct-onlyimage can be estimated by subtracting the estimated global compo-nent for the central pixel under a pinhole from the measured value,such that D0 ≈ L0 −G0.

3 Evaluation

General CAT systems use X-ray sources but I will experiment withvisible light sources and translucent objects to reconstruct sameimaging condition with using X-ray. X-ray images show absorbed

Figure 13: Pinhole images. (Left) Received images for each pin-hole camera, for the case when no scattering object is present.(Right) Pinhole images when a scattering object is present.

amount of X-ray when it is transmit through a media in 2D spatialdomain. Such imaging condition can be imitated with translucentmedia when visible rays are transmitted through it. The single shotCAT technique will be evaluated by the quality of 3D generationand imaging time which is a key factor for real-time 3D view gen-eration. Also, successful implementation of a wearable CAT systemwill be evaluated as another main goal of this thesis.

4 Required resources

To imitate X-ray imaging, translucent objects and LED sources willbe used. I already have some translucent objects like wine glassesand toys and it is easy to find such objects around us. MultipleLED sources have been prepared as well and they are aligned in ametal frame. To implement 4D light field capturing technique inlarge angular range, we need a pinhole array film or a lenslet arrayin big size. Now, we have some of them but might be required tomake new one to acquire higher spatial resolution. A normal DSLRdigital camera will be used to get images. For wearable application,we need to compose newly the mentioned stuff for a part of humanbody. In such setup, we will be required to use an imaging sensorand a lenslet array in small size. Small Dragonfly cameras will begood enough for the purpose and we have a small lenslet array in1in 1in. If we need to get new lenslet arrays, we can purchase aproper one from AOA Company in Cambridge.

5 Timeline

References

ATCHESON, B., IHRKE, I., HEIDRICH, W., TEVS, A., BRADLEY,D., MAGNOR, M., AND SEIDEL, H. P. 2008. Time-resolved3d capture of non-stationary gas flows. ACM Transactions onGraphics.

GORTLER, S. J., GRZESZCZUK, R., SZELISKI, R., ANDCOHEN, M. F. 1996. The lumigraph. SIGGRAPH.

GU, J., NAYAR, S., GRINSPUN, E., BELHUMEUR, P., AND RA-MAMOORTHI, R. 2008. Compressive structured light for recov-ering inhomogeneous participating media. In European Confer-ence on Computer Vision (ECCV).

ISAKSEN, A., MCMILLAN, L., AND GORTLER, S. 2000.Dynamically reparameterized light fields. Proc. SIGGRAPH.

0

50

100

150

200

250

0 5 10 15 20 25 30 35 40

Inte

nsi

ty

0

50

100

150

200

0 5 10 15 20 25 30 35 40

Inte

nsi

ty

L3L2L1L0L1L2L3

Mixed region by direct and global components

Pure global component region

Reference data captured without any object

Pure global component region

0

50

100

150

200

0 5 10 15 20 25 30 35 40

Inte

nsi

ty

Unknown global components

Known global component

Known global component

G3G2G1G0G1G2G3

0

50

100

150

0 5 10 15 20 25 30 35 40

Inte

nsi

ty

Pixel (p)

D3D2D1D0D1D2D3

Computed Direct components

Figure 14: (From top to bottom) First, a 1-D sensor image for a sin-gle LED illuminating a diffuser with no object present. Second, animage with a scattering object present. Third, measured (black) andestimated polynomial fit (red) for global-only component. Fourth,the direct-only image formed by subtracting the second from thethird.

JENSEN, H., MARSCHNER, S., LEBOY, M., AND HANRAHAN,P. 2001. A practical model for subsurface light transport. SIG-GRAPH, 511–518.

LANMAN, D., RASKAR, R., AGRAWAL, A., AND TAUBIN, G.2008. Modeling and capturing 3d occluders. SIGGRAPH Asia2008.

LEVOY, M., AND HANRAHAN, P. 1996. Light field rendering.SIGGRAPH96, 31–42.

LEVOY, M., CHEN, B., VAISH, V., HOROWITZ, M., MC-DOWALL, M., AND BOLAS, M. 2004. Synthetic apertureconfocal imaging. ACM Transactions on Graphics 23, 825–834.

NARASIMHAN, S. G., NAYAR, S. K., SUN, B., AND KOPPAL,S. J. 2005. Structured light in scattering media. In Proc. IEEEICCV 1, 420–427.

NASU, O., HIURA, S., AND SATO, K. 2007. Analysis oflight transport based on the separation of direct and indirectomponents. IEEE Intl. Workshop on Projector-Comera Sys-tems(ProCams 2007).

NAYAR, S., KRICHNAN, G., GROSSBERG, M., AND RASKAR, R.2006. Fast separation of direct and global components of a sceneusing high frequency illumination. Transactions on Graphics 12,3, 935–943.

NG, R., LEBOY, M., BREDIF, M., DUVAL, M., HOROWITZ, G.,AND HANRAHAN, P. 2004. Light field photography with ahand-held plenoptic camera. Tech. rep, Stanford University.

ROH, Y. J., PARK, W. S., CHO, H. S., AND JEON, H. J. 2004.Implementation of uniform and simultaneous art for 3-d recon-struction in an x-ray imaging system. IEEE Proceedings, Vision,Image and Signal Processing 151.

ROSEN, J., AND ABOOKASIS, D. 2004. Noninvasive optical imag-ing by speckle ensemble. Optics Letters 29, 3.

SUN, B., RAMAMOORTHI, R., NARASIMHAN, S. G., AND NA-YAR, S. K. 2005. A practical analytic single scattering modelfor real time rendering. ACM Transactions on Graphics, 1040–1049.

TRIFONOV, B., BRADLEY, D., AND HEIDRICH, W. 2006. To-mographic reconstruction of transparent objects. EurographicsSymposium on Rendering.

TUCHIN, V. 2000. Tissue optics. SPIE.

VAISH, V., WILBURN, B., JOSHI, N., AND LEVOY, M.2004. Using plane + parallax for calibrating dense camera ar-rays. Proc. Conf. Computer Vision and Pattern Recognition.

VEERARAGHAVAN, A., RASKAR, R., AGRAWAL, A., MOHAN,A., AND TUMBLIN, J. 2007. Mask enhanced cameras for het-erodyned light fields and coded aperture refocusing. ACM SIG-GRAPH 2007.

Figure 15: Timeline

Documents

Next Generation CAT Systemalumni.media.mit.edu/~jaewonk/Publications/Thesis... · Abstract Since the ﬁrst CAT system has been introduced in 1971, the main form using scanning of