1
FPA-CS: Focal Plane Array-based Compressive Imaging in Short-wave Infrared Huaijin Chen, 1 M. Salman Asif, 1 Aswin C. Sankaranarayanan, 2 and Ashok Veeraraghavan 1 1 Department of Electrical and Computer Engineering, Rice University. 2 Department of Electrical and Computer Engineering, Carnegie Mellon University. Total variation-based reconstruction DMD 64 x 64 SWIR sensor array scene (as seen in a visible camera) compressive low-res frames from 64x64 sensor reconstructed SWIR mega-pixel image Figure 1: Focal plane array-based compressive sensing (FPA-CS) camera architecture: A 64 × 64 SWIR sensor array is equivalent to 4096 single pixel cameras (SPCs) operating in parallel. This results in vastly superior spatio-temporal resolutions against what is achievable using the SPC or a traditional camera. Cameras for imaging in short and mid-wave infrared spectra are signif- icantly more expensive than their counterparts for visible imaging. For ex- ample, a cellphone camera with a several megapixel sensor costs a few dol- lars, but a megapixel sensor for short-wave infrared (SWIR) imaging costs tens of thousands dollars. As a result, high-resolution imaging beyond the visible spectrum remains out of reach for many consumers. Over the last decade, compressive sensing (CS) [1] has emerged as a useuful technology for designing high-resolution imaging systems us- ing low-resolution sensors. For instance, a single-pixel camera (SPC) uses a single-pixel detector and a digital micromirror device (DMD) to record coded measurements of a high-resolution image [3]. A computational re- construction algorithm is then used to recover the high-resolution image from the coded measurements. Unfortunately, the measurement rate of an SPC is insufficient for imaging at high spatial and temporal resolutions [5]. In this paper, we present a focal plane array-based compressive sensing (FPA-CS) architecture that achieves high spatial and temporal resolutions using inexpensive, low-resolution sensors. Our proposed architecture can be viewed as an array of SPCs working in parallel, thereby increasing the measurement rate, and consequently, the achievable spatio-temporal resolu- tion of CS-based cameras. We develop a proof-of-concept prototype SWIR video camera using a low-resolution sensor with 64 × 64 pixels; the proto- type provides a 4096× increase in measurement rate compared to the SPC, and for the first time, achieves megapixel resolution at video rate using CS techniques. Our prototype FPA-CS camera is constructed using a low-resolution sensor array of 64 × 64 pixels, each observing a 16 × 16 patch of micromir- rors. The DMD patterns and sensor readout timings are synchronized to record modulated, low-resolution images at a frame rate F s = 480 fps. The sensor image at time t can be described as y t = A t x t , where y t is a vec- tor with 4096 measurements, x t represents the high-resolution image at the DMD plane, and the matrix A t encodes modulation of x t with the DMD pattern and mapping onto the SWIR sensor pixels. To reconstruct video at a desired frame-rate, say F r fps, we divide low-resolution sensor im- ages into sets of T = F s /F r measurements, all of which correspond to the same high-resolution image. Suppose the kth set correspond to y t = A t x t for t =(k - 1)T + 1,..., kT ; we assume that x t = x k and stack all the y t and A t in the kth set in y k and A k , respectively. Our goal is to reconstruct the x k from the noisy and possibly under-determined sets of linear equations y k = A k x k . This is an extended abstract. The full paper is available at the Computer Vision Foundation webpage. Figure 2: Selected frames from reconstructed SWIR videos. Each frame in the moving car videos is reconstructed using 16 captured images; compres- sion factor α = 16, and a consequently 32-fps frame rate. Each frame in the moving hand videos is reconstructed using 22 captured images; compres- sion factor α = 11.6, and a consequently 21.8-fps frame rate. Both videos are reconstructed using 3D-TV prior. XT and YT slices for both videos are shown to the right of the images. Natural images have been shown to have sparse gradients. We can view a video signal as a 3D object that consists of a sequence of 2D images, and we expect pixels in each image to be similar to their neighbors along horizontal, vertical, and temporal directions. To exploit the spatio-temporal similarity in a video signal, we can use priors for sparse spatio-temporal gradients, and solve an optimization problem of the following form for reconstruction[4]: (TV) b x = arg min x TV 3D (x) subject to ky - Axk 2 ε , where the term TV 3D (x) refers to the 3D total-variation of x. TV 3D can be defined as TV 3D (x)= i q (D u x(i)) 2 +(D v x(i)) 2 +(D t x(i)) 2 , where D u x and D v x are the spatial gradients along horizontal and vertical dimensions of x, respectively, and D t x represents gradient along the tempo- ral dimension of x. We present some of our experimental results in Figure 2, where we used MFISTA [2] for the reconstruction of videos. FPA-CS provides three advantages over conventional imaging. First, our CS-inspired FPA-CS system provides an inexpensive alternative to achieve SWIR imaging in high spatiotemporal resolution . Second, compared to tra- ditional single-pixel-based compressive cameras, FPA-CS simultaneously records data from 4096 parallel, compressive systems, thereby significantly improves the measurement rate. As a consequence, the achieved spatio- temporal resolution of our device is an order of magnitude better than the SPC. [1] Richard Baraniuk. Compressive sensing. IEEE signal processing magazine, 24 (4), 2007. [2] A. Beck and M. Teboulle. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11):2419–2434, 2009. [3] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk. Single-pixel imaging via compressive sampling. IEEE Signal Processing Magazine, 25(2):83–91, Mar. 2008. [4] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total variation-based image restoration. Multiscale Modeling and Simulation, 4(2):460–489, 2005. [5] A. C. Sankaranarayanan, C. Studer, and R. G. Baraniuk. CS-MUVI: Video com- pressive sensing for spatial-multiplexing cameras. In IEEE International Con- ference on Computational Photography, 2012.

FPA-CS: Focal Plane Array-based ... - cv-foundation.orgFPA-CS: Focal Plane Array-based Compressive Imaging in Short-wave Infrared Huaijin Chen,1 M. Salman Asif,1 Aswin C. Sankaranarayanan,2

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FPA-CS: Focal Plane Array-based ... - cv-foundation.orgFPA-CS: Focal Plane Array-based Compressive Imaging in Short-wave Infrared Huaijin Chen,1 M. Salman Asif,1 Aswin C. Sankaranarayanan,2

FPA-CS: Focal Plane Array-based Compressive Imaging in Short-wave Infrared

Huaijin Chen,1 M. Salman Asif,1 Aswin C. Sankaranarayanan,2 and Ashok Veeraraghavan1

1Department of Electrical and Computer Engineering, Rice University.2Department of Electrical and Computer Engineering, Carnegie Mellon University.

Total variation-based reconstruction

DMD

64 x 64 SWIR sensor array

scene(as seen in a visible camera)

compressive low-res frames from 64x64

sensor

reconstructed SWIR mega-pixel image

Figure 1: Focal plane array-based compressive sensing (FPA-CS) cameraarchitecture: A 64× 64 SWIR sensor array is equivalent to 4096 singlepixel cameras (SPCs) operating in parallel. This results in vastly superiorspatio-temporal resolutions against what is achievable using the SPC or atraditional camera.

Cameras for imaging in short and mid-wave infrared spectra are signif-icantly more expensive than their counterparts for visible imaging. For ex-ample, a cellphone camera with a several megapixel sensor costs a few dol-lars, but a megapixel sensor for short-wave infrared (SWIR) imaging coststens of thousands dollars. As a result, high-resolution imaging beyond thevisible spectrum remains out of reach for many consumers.

Over the last decade, compressive sensing (CS) [1] has emerged asa useuful technology for designing high-resolution imaging systems us-ing low-resolution sensors. For instance, a single-pixel camera (SPC) usesa single-pixel detector and a digital micromirror device (DMD) to recordcoded measurements of a high-resolution image [3]. A computational re-construction algorithm is then used to recover the high-resolution imagefrom the coded measurements. Unfortunately, the measurement rate of anSPC is insufficient for imaging at high spatial and temporal resolutions [5].

In this paper, we present a focal plane array-based compressive sensing(FPA-CS) architecture that achieves high spatial and temporal resolutionsusing inexpensive, low-resolution sensors. Our proposed architecture canbe viewed as an array of SPCs working in parallel, thereby increasing themeasurement rate, and consequently, the achievable spatio-temporal resolu-tion of CS-based cameras. We develop a proof-of-concept prototype SWIRvideo camera using a low-resolution sensor with 64× 64 pixels; the proto-type provides a 4096× increase in measurement rate compared to the SPC,and for the first time, achieves megapixel resolution at video rate using CStechniques.

Our prototype FPA-CS camera is constructed using a low-resolutionsensor array of 64×64 pixels, each observing a 16×16 patch of micromir-rors. The DMD patterns and sensor readout timings are synchronized torecord modulated, low-resolution images at a frame rate Fs = 480 fps. Thesensor image at time t can be described as yt = Atxt , where yt is a vec-tor with 4096 measurements, xt represents the high-resolution image at theDMD plane, and the matrix At encodes modulation of xt with the DMDpattern and mapping onto the SWIR sensor pixels. To reconstruct videoat a desired frame-rate, say Fr fps, we divide low-resolution sensor im-ages into sets of T = Fs/Fr measurements, all of which correspond to thesame high-resolution image. Suppose the kth set correspond to yt = Atxt fort = (k−1)T +1, . . . ,kT ; we assume that xt = xk and stack all the yt and At inthe kth set in yk and Ak, respectively. Our goal is to reconstruct the xk fromthe noisy and possibly under-determined sets of linear equations yk = Akxk.

This is an extended abstract. The full paper is available at the Computer Vision Foundationwebpage.

Figure 2: Selected frames from reconstructed SWIR videos. Each frame inthe moving car videos is reconstructed using 16 captured images; compres-sion factor α = 16, and a consequently 32-fps frame rate. Each frame in themoving hand videos is reconstructed using 22 captured images; compres-sion factor α = 11.6, and a consequently 21.8-fps frame rate. Both videosare reconstructed using 3D-TV prior. XT and YT slices for both videos areshown to the right of the images.

Natural images have been shown to have sparse gradients. We can viewa video signal as a 3D object that consists of a sequence of 2D images,and we expect pixels in each image to be similar to their neighbors alonghorizontal, vertical, and temporal directions. To exploit the spatio-temporalsimilarity in a video signal, we can use priors for sparse spatio-temporalgradients, and solve an optimization problem of the following form forreconstruction[4]:

(TV) x̂ = argminx

TV3D(x) subject to ‖y−Ax‖2 ≤ ε,

where the term TV3D(x) refers to the 3D total-variation of x. TV3D can bedefined as

TV3D(x) = ∑i

√(Dux(i))2 +(Dvx(i))2 +(Dtx(i))2,

where Dux and Dvx are the spatial gradients along horizontal and verticaldimensions of x, respectively, and Dtx represents gradient along the tempo-ral dimension of x. We present some of our experimental results in Figure 2,where we used MFISTA [2] for the reconstruction of videos.

FPA-CS provides three advantages over conventional imaging. First,our CS-inspired FPA-CS system provides an inexpensive alternative to achieveSWIR imaging in high spatiotemporal resolution . Second, compared to tra-ditional single-pixel-based compressive cameras, FPA-CS simultaneouslyrecords data from 4096 parallel, compressive systems, thereby significantlyimproves the measurement rate. As a consequence, the achieved spatio-temporal resolution of our device is an order of magnitude better than theSPC.

[1] Richard Baraniuk. Compressive sensing. IEEE signal processing magazine, 24(4), 2007.

[2] A. Beck and M. Teboulle. Fast gradient-based algorithms for constrained totalvariation image denoising and deblurring problems. IEEE Transactions on ImageProcessing, 18(11):2419–2434, 2009.

[3] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, andR. G. Baraniuk. Single-pixel imaging via compressive sampling. IEEE SignalProcessing Magazine, 25(2):83–91, Mar. 2008.

[4] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularizationmethod for total variation-based image restoration. Multiscale Modeling andSimulation, 4(2):460–489, 2005.

[5] A. C. Sankaranarayanan, C. Studer, and R. G. Baraniuk. CS-MUVI: Video com-pressive sensing for spatial-multiplexing cameras. In IEEE International Con-ference on Computational Photography, 2012.