Decoding, Calibration and Rectiﬁcation for Lenselet-Based ...tion of demosaicing Bayer-pattern plenoptic images – we instead refer the reader to [23] and related work. For the

Decoding, Calibration and Rectification for Lenselet-Based Plenoptic Cameras

Donald G. Dansereau, Oscar Pizarro and Stefan B. WilliamsAustralian Centre for Field Robotics; School of Aerospace, Mechanical and Mechatronic Engineering

University of Sydney, NSW, Australia{d.dansereau, o.pizarro, s.williams}@acfr.usyd.edu.au

Abstract

Plenoptic cameras are gaining attention for their uniquelight gathering and post-capture processing capabilities.We describe a decoding, calibration and rectification pro-cedure for lenselet-based plenoptic cameras appropriate fora range of computer vision applications. We derive a novelphysically based 4D intrinsic matrix relating each recordedpixel to its corresponding ray in 3D space. We furtherpropose a radial distortion model and a practical objec-tive function based on ray reprojection. Our 15-parametercamera model is of much lower dimensionality than cam-era array models, and more closely represents the physicsof lenselet-based cameras. Results include calibration of acommercially available camera using three calibration gridsizes over five datasets. Typical RMS ray reprojection er-rors are 0.0628, 0.105 and 0.363 mm for 3.61, 7.22 and35.1 mm calibration grids, respectively. Rectification ex-amples include calibration targets and real-world imagery.

1. Introduction

Plenoptic cameras [17] measure both colour and geo-metric information, and can operate under conditions pro-hibitive to other RGB-D cameras, e.g. in bright sunlight orunderwater. With increased depth of field and light gather-ing relative to conventional cameras, and post-capture ca-pabilities ranging from refocus to occlusion removal andclosed-form visual odometry [1, 16, 4, 9, 19, 6], plenop-tic cameras are poised to play a significant role in computervision applications. As such, accurate plenoptic calibrationand rectification will become increasingly important.

Prior work in this area has largely dealt with camera ar-rays [20, 18], with very little work going toward the cali-bration of lenselet-based cameras. By exploiting the phys-ical characteristics of a lenselet-based plenoptic camera,we impose significant constraints beyond those present ina multiple-camera scenario. In so doing, we increase the

robustness and accuracy of the calibration process, whilesimultaneously decreasing the complexity of the model.

In this work we present a novel 15-parameter plenop-tic camera model relating pixels to rays in 3D space, in-cluding a 4D intrinsic matrix based on a projective pinholeand thin-lens model, and a radial direction-dependent dis-tortion model. We present a practical method for decod-ing a camera’s 2D lenselet images into 4D light fields with-out prior knowledge of its physical parameters, and describean efficient projected-ray objective function and calibrationscheme. We use these to accurately calibrate and rectify im-ages from a commercially available Lytro plenoptic camera.

The remainder of this paper is organized as follows: Sec-tion 2 reviews relevant work; Section 3 provides a practicalmethod for decoding images; Section 4 derives the 4D in-trinsic and distortion models; Section 5 describes the cali-bration and rectification procedures; Section 6 provides val-idation results; and finally, Section 7 draws conclusions andindicates directions for future work.

2. Prior WorkPlenoptic cameras come in several varieties, including

mask-based cameras, planar arrays, freeform collections ofcameras [13, 22, 21, 18], and of course lenticular array-based cameras. The latter include the “original” plenop-tic camera as described by Ng et al. [17], with which thepresent work is concerned, and the “focused” plenopticcamera described by Lumsdaine and Georgiev [14]. Eachcamera has unique characteristics, and so the optimal modeland calibration approach for each will differ.

Previous work has addressed calibration of grids orfreeform collections of multiple cameras [20, 18]. Similarto this is the case of a moving camera in a static scene, forwhich structure-from-motion can be extended for plenopticmodelling [12]. These approaches introduce more degreesof freedom in their models than are necessary to describethe lenselet-based plenoptic camera. Our work introducesa more constrained intrinsic model based on the physicalproperties of the camera, yielding a more robust, physically-grounded, and general calibration.

Figure 1. Crop of a raw lenselet image after demosaicing and with-out vignetting correction; pictured is a rainbow lorikeet

In other relevant work, Georgiev et al. [7] derive aplenoptic camera model using ray transfer matrix analy-sis. Our model is more detailed, accurately describing areal-world camera by including the effects of lens distortionand projection through the lenticular array. Unlike previousmodels, ours also allows for continuous variation in the po-sitions of rays, rather than unrealistically constraining themto pass through a set of pinholes.

Finally, our ray model draws inspiration from the workof Grossberg and Nayar [8], who introduce a generalizedimaging model built from virtual sensing elements. How-ever, their piecewise-continuous pixel-ray mapping doesnot apply to the plenoptic camera, and so our camera modeland calibration procedure differ significantly from theirs.

3. Decoding to an Unrectified Light FieldLight fields are conventionally represented and pro-

cessed in 4D, and so we begin by presenting a practicalscheme for decoding raw 2D lenselet images to a 4D lightfield representation. Note that we do not address the ques-tion of demosaicing Bayer-pattern plenoptic images – weinstead refer the reader to [23] and related work. For thepurposes of this work, we employ conventional linear de-mosaicing applied directly to the raw 2D lenselet image.This yields undesired effects in pixels near lenselet edges,and we therefore ignore edge pixels during calibration.

In general the exact placement of the lenselet array isunknown, with lenselet spacing being a non-integer multi-ple of pixel pitch, and unknown translational and rotationaloffsets further complicating the decode process. A crop ofa typical raw lenselet image is shown in Fig. 1 – note thatthe lenselet grid is hexagonally packed, further complicat-ing the decoding process. To locate lenselet image centerswe employ an image taken through a white diffuser, or ofa white scene. Because of vignetting, the brightest spot ineach white lenselet image approximates its center.

A crop of a typical white image taken from the Lytro isshown in Fig. 7a. A low-pass filter is applied to reduce sen-sor noise prior to finding the local maximum within eachlenselet image. Though this result is only accurate to the

Slice into 4D

1D Interp. k(de-hex, square k,l)

1D Interp. i(square i,j)

k

l

ji

k

l

ji k

l

ji

Align2D Input

Figure 2. Decoding the raw 2D sensor image to a 4D light field

nearest pixel, gathering statistics over the entire image miti-gates the impact of quantization. Grid parameters are es-timated by traversing lenselet image centers, finding themean horizontal and vertical spacing and offset, and per-forming line fits to estimate rotation. An optimization ofthe estimated grid parameters is possible by maximizing thebrightness under estimated grid centers, but in practice wehave found this to yield a negligible refinement.

From the estimated grid parameters there are many po-tential methods for decoding the lenselet image to a 4D lightfield. The method we present was chosen for its ease of im-plementation. The process begins by demosaicing the rawlenselet image, then correcting vignetting by dividing by thewhite image. At this point the lenselet images, depicted inblue in Fig. 2, are on a non-integer spaced, rotated grid rel-ative to the image’s pixels (green). We therefore resamplethe image, rotating and scaling so all lenselet centers fallon pixel centers, as depicted in the second frame of the fig-ure. The required scaling for this step will not generally besquare, and so the resulting pixels are rectangular.

Aligning the lenselet images to an integer pixel grid al-lows a very simple slicing scheme: the light field is brokeninto identically sized, overlapping rectangles centered onthe lenselet images, as depicted in the top-right and bottom-left frames of Fig. 2. The spacing in the bottom-left framerepresents the hexagonal sampling in the lenselet indicesk, l, as well as non-square pixels in the pixel indices i, j.

Converting hexagonally sampled data to an orthogonalgrid is a well-explored topic; see [2] for a reversible con-version based on 1D filters. We implemented both a 2Dinterpolation scheme operating in k, l, and a 1D schemeinterpolating only along k, and have found the latter ap-proach, depicted in the bottom middle frame of Fig. 2, tobe a good approximation. For rectangular lenselet arrays,this interpolation step is omitted. As we interpolate in kto compensate for the hexagonal grid’s offsets, we simulta-neously compensate for the unequal vertical and horizontalsample rates. The final stage of the decoding process cor-

dM

dµ

i,j k,l

D

s,t u,v

Figure 3. The main lens is modelled as a thin lens and the lenseletsas an array of pinholes; gray lines depict lenselet image centers

rects for the rectangular pixels in i, j through a 1D interpo-lation along i. In every interpolation step we increase theeffective sample rate in order to avoid loss of information.The final step, not shown, is to mask off pixels that fall out-side the hexagonal lenselet image. We denote the result ofthe decode process the “aligned” light field LA(i, j, k, l).

4. Pinhole and Thin Lens ModelIn this section we derive the relationship between the

indices of each pixel and its corresponding spatial ray.Though each pixel of a plenoptic camera integrates lightfrom a volume, we approximate each as integrating along asingle ray [8]. We model the main lens as a thin lens, andthe lenselets as an array of pinholes, as depicted in Fig. 3.Our starting point is an index resulting from the decodingscheme described above, expressed in homogeneous coor-dinates n = [i, j, k, l, 1], where k, l are the zero-based ab-solute indices of the lenselet through which a ray passes,and i, j are the zero-based relative pixel indices within eachlenselet image. For lenselet images of N ×N pixels, i andj each range from 0 to N − 1.

We derive a homogeneous intrinsic matrix H ∈ R5×5

by applying a series of transformations, first converting theindex n to a ray representation suitable for ray transfer ma-trix analysis, then propagating it through the optical system,and finally converting to a light field ray representation. Thefull sequence of transformations is given by

φA =HφΦH

MHTHΦφH

φabsH

absreln =Hn. (1)

We will derive each component of this process in the 2Dplane, starting with the homogenous relative index n2D =[i, k, 1], and later generalize the result to 4D.

The conversion from relative to absolute indices, Habsrel

is straightforwardly found from the number of pixels perlenselet N and a translational pixel offset cpix (below). Wenext convert from absolute coordinates to a light field ray,with the imaging and lenselet planes as the reference planes.We accomplish this usingHφ

abs,

Habsrel =

[1 N -cpix0 1 00 0 1

], Hφ

abs =

[1/Fs 0 -cM/Fs0 1/Fu -cµ/Fu0 0 1

], (2)

where F∗ and c∗ are the spatial frequencies in samples/m,and offsets in samples, of the pixels and lenselets.

Next we express the ray as position and direction viaHΦφ

(below), and propagate to the main lens usingHT :

HΦφ =

[ 1 0 0-1/dµ 1/dµ 0

0 0 1

], HT =

[1 dµ+dM 00 1 00 0 1

], (3)

where d∗ are the lens separations as depicted in Fig. 3. Notethat in the conventional plenoptic camera, dµ = fµ, thelenselet focal length.

Next we apply the main lens using a thin lens and smallangle approximation (below), and convert back to a lightfield ray representation, with the main lens as the s, t plane,and the u, v plane at an arbitrary plane separation D:

HM =[

1 0 0-1/fM 1 0

0 0 1

], Hφ

Φ =[1 D 00 1 00 0 1

], (4)

where fM is the focal length of the main lens. Because hor-izontal and vertical components are independent, extensionto 4D is straightforward. Multiplying through Eq. 1 yieldsan expression forH with twelve non-zero terms:[

stuv1

]=

H1,1 0 H1,3 0 H1,5

0 H2,2 0 H2,4 H2,5

H3,1 0 H3,3 0 H3,5

0 H4,2 0 H4,4 H4,5

0 0 0 0 1

[ ijkl1

]. (5)

In a model with pixel or lenselet skew we would expectmore non-zero terms. In Section 5 we show that two ofthese parameters are redundant with camera pose, leavingonly 10 free intrinsic parameters.

4.1. Projection Through the Lenselets

We have hidden some complexity in deriving the 4D in-trinsic matrix by assuming prior knowledge of the lenseletassociated with each pixel. As depicted by the gray linesin Fig. 3, the projected image centers will deviate from thelenselet centers, and as a result a pixel will not necessarilyassociate with its nearest lenselet. Furthermore, the decod-ing process presented in Section 3 includes several manipu-lations which will change the effective camera parameters.By resizing, rotating, interpolating, and centering on theprojected lenselet images, we have created a virtual lightfield camera with its own parameters. In this section wecompensate for these effects through the application of cor-rection coefficients to the physical camera parameters.

Lenselet-based plenoptic cameras are constructed withcareful attention to the coplanarity of the lenselet array andimage plane [17]. As a consequence, projection throughthe lenselets is well-approximated by a single scaling fac-tor, Mproj . Scaling and adjusting for hexagonal samplingcan similarly be modelled as scaling factors. We thereforecorrect the pixel sample rates using

Mproj = [1 + dµ/dM ]-1, Ms = NA/N S , Mhex = 2/

√3 ,

F A

s =MsMprojFS

s , F A

u =MhexFS

u , (6)

where superscripts indicate that a measure applies to thephysical sensor (S), or to the virtual “aligned” camera (A);Mproj is derived from similar triangles formed by each grayprojection line in Fig. 3; Ms is due to rescaling; and Mhex

is due to hexagonal/Cartesian conversion. Extension to thevertical dimensions is trivial, omitting Mhex.

4.2. Lens Distortion Model

The physical alignment and characteristics of the lenseletarray as well as all the elements of the main lens potentiallycontribute to lens distortion. In the results section we showthat the consumer plenoptic camera we employ suffers pri-marily from directionally dependent radial distortion,

θd = (1 + k1r2 + k2r

4 + · · · )(θu − b

)+ b,

r =√θ2s + θ2t , (7)

where b captures decentering, k are the radial distortion co-efficients, and θu and θd are the undistorted and distorted2D ray directions, respectively. Note that we apply the smallangle assumption, such that θ ≈ [dx/dz, dy/dz]. We de-fine the complete distortion vector as d = [b,k]. Extensionto more complex distortion models is left as future work.

5. Calibration and RectificationThe plenoptic camera gathers enough information to per-

form calibration from unstructured and unknown environ-ments. However, as a first pass we take a more conven-tional approach familiar from projective camera calibra-tion [10, 24], in which the locations of a set of 3D featuresare known – we employ the corners of a checkerboard pat-tern of known dimensions, with feature locations expressedin the frame of reference of the checkerboard. As depictedin Fig. 4a, projective calibration builds an objective func-tion from the 2D distance between observed and expectedprojected feature locations, n and n, forming the basis foroptimization over the camera’s poses and intrinsics.

Plenoptic calibration is complicated by the fact that a sin-gle feature will appear in the imaging plane multiple times,as depicted in Fig. 4b. A tempting line of reasoning is toagain formulate an error metric based on the 2D distance be-tween observed and expected feature locations. The prob-lem arises that the observed and expected features do notgenerally appear in the same lenselet images – indeed thenumber of expected and observed features is not generallyequal. As such, a meaningful way of finding the “closest”distance between each observation and the set of expectedfeatures is required. We propose two practical methods. Inthe first, each known 3D feature location P is transformedto its corresponding 4D light field plane λ using the point-plane correspondence [5]. The objective function is thentaken as the point-to-plane distance between each observa-tion n and the plane λ. The second approach generates a

P

(a)

P

(b)Figure 4. In conventional projective calibration (a) a 3D featureP has one projected image, and a convenient error metric is the2D distance between the expected and observed image locations|n−n|. In the plenoptic camera (b) each feature has multiple ex-pected and observed images nj ,ni ∈ R4, which generally do notappear beneath the same lenselets; we propose the per-observationray reprojection metric |Ei| taken as the 3D distance between thereprojected ray φi and the feature location P .

projected ray φ from each observation n. The error metric,which we denote the “ray reprojection error”, is taken asthe point-to-ray distance between φ and P , as depicted inFig. 4b. The two methods are closely related, and we pursuethe second, as it is computationally simpler.

The observed feature locations are extracted by treatingthe decoded light field from Section 3 as an array ofNi×Nj2D images in k and l, applying a conventional feature detec-tion scheme [11] to each. If the plenoptic camera takes onM poses in the calibration dataset and there are nc featureson the calibration target, the total feature set over which weoptimize is of size ncMNiNj . Our goal is to find the intrin-sic matrixH , camera poses T , and distortion parameters dwhich minimize the error across all features,

argminH,T ,d

nc∑c=1

M∑m=1

Ni∑s=1

Nj∑t=1

||φs,tc (H,Tm,d),Pc||pt-ray, (8)

where ||·||pt-ray is the ray reprojection error described above.Each of the M camera poses has 6 degrees of free-

dom, and from Eq. 5 the intrinsic model H has 12 freeparameters. However, there is a redundancy betweenH1,5, H2,5, which effect horizontal translation within theintrinsic model, and the translational components of theposes T . Were this redundancy left in place, the intrinsicmodel could experience unbounded translational drift andfail to converge. We therefore force the intrinsic parametersH1,5 and H2,5 such that pixels at the center of i, j map torays at s, t = 0. Because of this forcing, the physical loca-tion of s, t = 0 on the camera will remain unknown, and ifit is required must be measured by alternative means.

The number of parameters over which we optimize isnow reduced to 10 for intrinsics, 5 for lens distortion, and6 for each of the M camera poses, for a total of 6M +15. Note the significant simplification relative to multiple-camera approaches, which grow with sample count in i andj – this is discussed further in Results.

As in monocular camera calibration, a Levenberg-Marquardt or similar optimization algorithm can be em-ployed which exploits knowledge of the Jacobian. Ratherthan deriving the Jacobian here we describe its sparsity pat-tern and show results based on the trust region reflectivealgorithm implemented in MATLAB’s lsqnonlin func-tion [3]. In practice we have found this to run quickly onmodern hardware, finishing in tens of iterations and takingin the order of minutes to complete.

The Jacobian sparsity pattern is easy to derive:each of the M pose estimates will only influence thatpose’s ncNiNj error terms, while all of the 15 in-trinsic and distortion parameters will affect every er-ror term. As a practical example, for a checkerboardwith 256 corners, viewed from 16 poses by a camerawith Ni = Nj = 8 spatial samples, there will beNe = ncMNiNj = (16)(8)(8)(256) = 262,144 errorterms and Nv = 6M + 15 = 123 optimization vari-ables. Of the NeNv = 32,243,712 possible interactions,(15 + 6)Ne = 5,505,024, or about 17% will be non-zero.

5.1. Initialization

The calibration process proceeds in stages: first initialpose and intrinsic estimates are formed, then an optimiza-tion is carried out with no distortion parameters, and finallya full optimization is carried out with distortion parameters.To form initial pose estimates, we again treat the decodedlight fields across M poses each as an array of Ni ×Nj 2Dimages. By passing all the images through a conventionalcamera calibration process, for example that proposed byHeikkila [10], we obtain a per-image pose estimate. Tak-ing the mean or median within each light field’s Ni × Njper-image pose estimates yieldsM physical pose estimates.Note that distortion parameters are excluded from this pro-cess, and the camera intrinsics that it yields are ignored.

In Section 4 we derived a closed-form expression for theintrinsic matrixH based on the plenoptic camera’s physicalparameters and the parameters of the decoding process (1),(6). We use these expressions to form the initial estimateof the camera’s intrinsics. We have found the optimizationprocess to be insensitive to errors in these initial estimates,and in cases where the physical parameters of the cameraare unknown, rough estimates may suffice. Automatic esti-mation of the initial parameters is left as future work.

5.2. Rectification

We wish to rectify the light field imagery, reversing theeffects of lens distortion and yielding square pixels in i, jand k, l. Our approach is to interpolate from the decodedlight field LA at a set of continuous-domain indices nA suchthat the interpolated light field approximates a distortion-free rectified light field LR. In doing so, we must selectan ideal intrinsic matrixHR, bearing in mind that deviating

desired

inversedistortion

undisto

rted

Figure 5. Reversing lens distortion: tracing from the desired pixellocation nR through the ideal optical system, reversing lens dis-tortion, then returning through the physical optical system to themeasured pixel nA

too far from the physical camera parameters will yield blackpixels near the edges of the captured light field, where noinformation is available. At the same time, we wish to forcehorizontal and vertical sample rates to be equal – i.e. wewish to force H1,1 = H2,2, H1,3 = H2,4, H3,1 = H4,2 andH3,3 = H4,4. As a starting point, we replace each of thesefour pairs with the mean of its members, simultaneouslyreadjusting H1,5 and H2,5 so as to maintain the centeringdescribed earlier.

The rectification process is depicted in Fig. 5, with theoptical system treated as a black box. To find nA we beginwith the indices of the rectified light field nR, and projectthrough the ideal optical system by applying HR, yieldingthe ideal ray φR. Referring to the distortion model (7), thedesired ray φR is arrived at by applying the forward modelto some unknown undistorted ray φA. Assuming we canfind φA, shown below, the desired index nA is arrived at byapplying the inverse of the estimated intrinsic matrix H

-1.

There is no closed-form solution to the problem of re-versing the distortion model (7), and so we propose an iter-ative approach similar to that of Melen [15]. Starting withan estimate of r taken from the desired ray φR, we solve forthe first-pass estimate φA

1 using (7), then update r from thenew estimate and iterate. In practice we have found as fewas two iterations to produce acceptable results.

6. ResultsWe carried out calibration on five datasets collected with

the commercially available Lytro plenoptic camera. Thesame camera was used for all datasets, but the optical con-figuration was changed between datasets by adjusting thecamera’s focal settings – care was taken not to change set-tings within a dataset.

Three calibration grids of differing sizes were used: a19 × 19 grid of 3.61 mm cells, a 19 × 19 grid of 7.22 mmcells, and an 8 × 6 grid of 35.1 × 35.0 mm cells. Imageswithin each dataset were taken over a range of depths andorientations. In Datasets A and B, range did not exceed20 cm, in C and D it did not exceed 50 cm, and in E it did notexceed 2 m. Close ranges were favoured in all datasets so asto maximize accuracy in light of limited effective baseline

Table 1. Virtual “Aligned” Camera ParametersParameter Value

N 10 pixFs, Fu 716,790, 71,950 samp/m

cM , cµ, cpix 1,645.3, 164.7, 6 sampdM , dµ, fM 6.6506, 0.025, 6.45 mm

Table 2. Estimated Parameters, Dataset BParameter Initial Intrinsics DistortionH1,1 3.6974e-04 3.7642e-04 4.0003e-04H1,3 -8.6736e-19 -5.6301e-05 -9.3810e-05H1,5 -1.5862e-03 8.8433e-03 1.5871e-02H2,2 3.6974e-04 3.7416e-04 3.9680e-04H2,4 -8.6736e-19 -5.3831e-05 -9.3704e-05H2,5 -1.5862e-03 8.3841e-03 1.5867e-02H3,1 -1.5194e-03 -1.1888e-03 -1.1833e-03H3,3 1.8167e-03 1.7951e-03 1.8105e-03H3,5 -3.3897e-01 -3.3681e-01 -3.3175e-01H4,2 -1.5194e-03 -1.1657e-03 -1.1583e-03H4,4 1.8167e-03 1.7830e-03 1.8077e-03H4,5 -3.3897e-01 -3.2501e-01 -3.2230e-01b1 . . 1.5258e-01b2 . . -1.1840e-01k1 . . 2.9771e+00k2 . . -3.4308e-03k3 . . -5.5949e-03

Table 3. RMS Ray Reprojection Error (mm)Dataset/grid Initial Intrin. Dist. Multi295 Multi631A/3.61 3.20 0.146 0.0835 0.198 0.109B/3.61 5.06 0.148 0.0628 0.178 0.0682C/7.22 8.63 0.255 0.106 0.220 0.107D/7.22 5.92 0.247 0.105 0.382 0.108E/35.1 13.8 0.471 0.363 2.22 0.336

in the s, t plane. This did not limit the applicability of eachcalibration to longer-range imagery.

The datasets each contained between 10 and 18 poses,and are available online1. Investigating the minimum num-ber of poses required to obtain good calibration results isleft as future work, but from the results obtained it is clearthat 10 is sufficient for appropriately diverse poses.

The decoding process requires a white image for locat-ing lenselet image centers and correcting for vignetting.For this purpose, we used white images provided with thecamera. Fig. 7a shows a crop of a typical white image,with the grid model overlaid. A closeup of one of thecheckerboard images after demosaicing and correcting forvignetting is shown in Fig. 7b. We decoded to a 10-pixelaligned intermediary image yielding, after interpolations,11 × 11 × 380 × 380 pixels. We ignored a border of twopixels in i, j due to demosaicing and edge artefacts.

An initial estimate of the camera’s intrinsics was formedfrom its physical parameters, adjusted to reflect the parame-ters of the decode process using Eq. 6. The adjusted param-

1http://marine.acfr.usyd.edu.au/permlinks/Plenoptic

eters for Dataset B are shown in Table 1, and the resultingintrinsics appear in the “Initial” column of Table 2.

For feature detection we used the Robust Automatic De-tection Of Calibration Chessboards [11] toolbox2. All fea-tures appear in all images, simplifying the task of associ-ating them. Each calibration stage converged within 15iterations in all cases, with the longer-range datasets gen-erally taking longer to converge. Table 2 shows the esti-mated parameters for Dataset B at the three stages of thecalibration process: initial estimate, intrinsics without dis-tortion, and intrinsics with distortion. Table 3 summarizesthe root mean square (RMS) ray reprojection error, as de-scribed in Section 5, at the three calibration stages andacross the five datasets. Results are also shown for twoconventional multiple-camera calibration models, Multi295and Multi631. The first represents the plenoptic camera asan array of projective sub-cameras with independent rela-tive poses and identical intrinsics and distortion parameters,while the second also includes per-sub-camera intrinsic anddistortion parameters. Both camera array models grow incomplexity with sample count in i and j, and for 7×7 sam-ples require 295 and 631 parameters, respectively.

From Table 3, the Multi295 model performs poorly, whileMulti631 approaches the performance of our proposed 15-parameter model. Referring to Table 2, we observe that thecalibrated H1,3 and H2,4 terms converged to nonzero val-ues. These represent the dependence of a ray’s position onthe lenselet through which it passes, and a consequence ofthese nonzero values is that rays take on a wide variety ofrational-valued positions in the s, t plane. This raises animportant problem with the multiple-camera models, whichunrealistically constrain rays to pass through a small setof sub-camera apertures, rather than allowing them to varysmoothly in position. We take this to explain the poor per-formance of the Multi295 model. The Multi631 model per-formed well despite this limitation, which we attribute to itsvery high dimensionality. Aside from the obvious tradeoffin complexity – compare with our proposed 15-parametermodel – this model presents a risk of overfitting and corre-spondingly reduced generality.

Fig. 6 depicts typical ray reprojection error in our pro-posed model as a function of direction and position. Thetop row depicts error with no distortion model, and clearlyshows a radial pattern as a function of both direction (left)and position (right). The bottom row shows error with theproposed distortion model in place – note the order of mag-nitude reduction in the error scale, and the absence of anyevident radial pattern. This shows the proposed distortionmodel to account for most lens distortion for this camera.

We have carried out decoding and rectification on a widerange of images – more than 700 at the time of writing.

2http://www-personal.acfr.usyd.edu.au/akas9185/AutoCalib/AutoCamDoc/index.html

http://marine.acfr.usyd.edu.au/permlinks/Plenoptic

http://marine.acfr.usyd.edu.au/permlinks/Plenoptic

http://www-personal.acfr.usyd.edu.au/akas9185/AutoCalib/AutoCamDoc/index.html

http://www-personal.acfr.usyd.edu.au/akas9185/AutoCalib/AutoCamDoc/index.html

−20

2−2

0

22

6

u−s (mm)

|E|

t−v (mm)

x 10−4

(a)

−50

5−5

0

52

6

t (mm)|E

|s (mm)

x 10−4

(b)

−20

2−2

0

22

10

t−v (mm)

|E|

u−s (mm)

x 10−5

(c)

s (mm)

x 10

−100

10 −10

0

102

10

t (mm)

|E|

−5

(d)Figure 6. Ray reprojection error for Dataset B. Left: error vs. raydirection; right: error vs. ray position; top: no distortion model;bottom: the proposed five-parameter distortion model – note theorder of magnitude difference in the error scale. The proposedmodel has accounted for most lens distortion for this camera.

Examples of decoded and rectified light fields are shownin Figs. 7c–h, as 2D slices in k, l – i.e. with i and j fixed– and further examples are available online. Rectificationused a four-iteration inverse distortion model. The straightred rulings aid visual confirmation that rectification has sig-nificantly reduced the effects of lens distortion. The twolast images are also shown in Fig. 8 as slices in the hori-zontal i, k plane passing through the center of the lorikeet’seye. The straight lines display minimal distortion, and thatthey maintain their slopes confirms that rectification has notdestroyed the 3D information captured by the light field.

7. Conclusions and Future Work

We have presented a 15-parameter camera model andmethod for calibrating a lenselet-based plenoptic camera.This included derivation of a novel physically based 4D in-trinsic matrix and distortion model which relate the indicesof a pixel to its corresponding spatial ray. We proposed apractical objective function based on ray reprojection, andpresented an optimization framework for carrying out cal-ibration. We also presented a method for decoding hexag-onal lenselet-based plenoptic images without prior knowl-edge of the camera’s parameters, and related the resultingimages to the camera model. Finally, we showed a methodfor rectifying the decoded images, reversing the effects oflens distortion and yielding square pixels in i, j and k, l. Inthe rectified images, the ray corresponding to each pixel iseasily found through a single matrix multiplication (5).

(a) (b)

(c) (d)

(e) (f)

(g) (h)Figure 7. a) Crop of a white image overlaid with the estimatedgrid, and b) the demosaiced and vignetting-corrected raw checker-board image; c–h) examples of (left) unrectified and (right) recti-fied light fields; red rulings aid confirmation that rectification hassignificantly reduced the effect of lens distortion.

Validation included five datasets captured with a com-mercially available plenoptic camera, over three calibra-tion grid sizes. Typical RMS ray reprojection errors were0.0628, 0.105 and 0.363 mm for 3.61, 7.22 and 35.1 mmcalibration grids, respectively. Real-world rectified imagery

(a)

(b)Figure 8. Slices in the horizontal plane i, k of the a) unrectifiedand b) rectified lorikeet images from Figs. 7g and h; i is on thevertical axis, and k on the horizontal

demonstrated a significant reduction in lens distortion. Fu-ture work includes automating initial estimation of the cam-era’s physical parameters, more complex distortion models,and autocalibration from arbitrary scenes.

Acknowledgments This work is supported in part by theAustralian Research Council (ARC), the New South WalesState Government, the Australian Centre for Field Robotics,The University of Sydney, and the Australian Government’sInternational Postgraduate Research Scholarship (IPRS).

References[1] T. Bishop and P. Favaro. The light field camera: Extended

depth of field, aliasing, and superresolution. Pattern Analysisand Machine Intelligence, IEEE Trans. on, 34(5):972–986,May 2012.

[2] L. Condat, B. Forster-Heinlein, and D. Van De Ville. H2O:reversible hexagonal-orthogonal grid conversion by 1-D fil-tering. In Image Processing, 2007. ICIP 2007. IEEE Intl.Conference on, volume 2, pages II–73. IEEE, 2007.

[3] A. Conn, N. Gould, and P. Toint. Trust region methods, vol-ume 1. Society for Industrial Mathematics, 1987.

[4] D. G. Dansereau, D. L. Bongiorno, O. Pizarro, and S. B.Williams. Light field image denoising using a linear 4Dfrequency-hyperfan all-in-focus filter. In Proceedings SPIEComputational Imaging XI, page 86570P, Feb 2013.

[5] D. G. Dansereau and L. T. Bruton. A 4-D dual-fan filterbank for depth filtering in light fields. IEEE Trans. on SignalProcessing, 55(2):542–549, 2007.

[6] D. G. Dansereau, I. Mahon, O. Pizarro, and S. B. Williams.Plenoptic flow: Closed-form visual odometry for lightfield cameras. In Intelligent Robots and Systems (IROS),IEEE/RSJ Intl. Conf. on, pages 4455–4462. IEEE, Sept 2011.

[7] T. Georgiev, A. Lumsdaine, and S. Goma. Plenoptic princi-pal planes. In Computational Optical Sensing and Imaging.Optical Society of America, 2011.

[8] M. Grossberg and S. Nayar. The raxel imaging model andray-based calibration. International Journal of Computer Vi-sion, 61(2):119–137, 2005.

[9] M. Harris. Focusing on everything – light field cameraspromise an imaging revolution. IEEE Spectrum, 5:44–50,2012.

[10] J. Heikkila and O. Silven. A four-step camera calibrationprocedure with implicit image correction. In Computer Vi-sion and Pattern Recognition (CVPR), IEEE Conference on,pages 1106–1112. IEEE, 1997.

[11] A. Kassir and T. Peynot. Reliable automatic camera-lasercalibration. In Australasian Conference on Robotics and Au-tomation, 2010.

[12] R. Koch, M. Pollefeys, L. Van Gool, B. Heigl, and H. Nie-mann. Calibration of hand-held camera sequences forplenoptic modeling. In ICCV, volume 1, pages 585–591.IEEE, 1999.

[13] D. Lanman. Mask-based Light Field Capture and Display.PhD thesis, Brown University, 2012.

[14] A. Lumsdaine and T. Georgiev. The focused plenoptic cam-era. In Computational Photography (ICCP), IEEE Intl. Con-ference on, pages 1–8. IEEE, 2009.

[15] T. Melen. Geometrical modelling and calibration of videocameras for underwater navigation. Institutt for TekniskKybernetikk, Universitetet i Trondheim, Norges TekniskeHøgskole, 1994.

[16] R. Ng. Fourier slice photography. In ACM Trans. on Graph-ics (TOG), volume 24, pages 735–744. ACM, Jul 2005.

[17] R. Ng, M. Levoy, M. Bredif, G. Duval, M. Horowitz,and P. Hanrahan. Light field photography with a hand-held plenoptic camera. Computer Science Technical ReportCSTR, 2, 2005.

[18] T. Svoboda, D. Martinec, and T. Pajdla. A convenient mul-ticamera self-calibration for virtual environments. Pres-ence: Teleoperators & Virtual Environments, 14(4):407–422, 2005.

[19] V. Vaish, M. Levoy, R. Szeliski, C. Zitnick, and S. Kang.Reconstructing occluded surfaces using synthetic apertures:Stereo, focus and robust measures. In Computer Visionand Pattern Recognition (CVPR), IEEE Conference on, vol-ume 2, pages 2331–2338. IEEE, 2006.

[20] V. Vaish, B. Wilburn, N. Joshi, and M. Levoy. Using plane+ parallax for calibrating dense camera arrays. In ComputerVision and Pattern Recognition (CVPR), IEEE Conferenceon, volume 1, pages I–2. IEEE, 2004.

[21] B. Wilburn, N. Joshi, V. Vaish, E. Talvala, E. Antunez,A. Barth, A. Adams, M. Horowitz, and M. Levoy. High per-formance imaging using large camera arrays. ACM Trans.on Graphics (TOG), 24(3):765–776, 2005.

[22] Z. Xu, J. Ke, and E. Lam. High-resolution lightfield pho-tography using two masks. Optics Express, 20(10):10971–10983, 2012.

[23] Z. Yu, J. Yu, A. Lumsdaine, and T. Georgiev. An analysisof color demosaicing in plenoptic cameras. In Computer Vi-sion and Pattern Recognition (CVPR), IEEE Conference on,pages 901–908. IEEE, 2012.

[24] Z. Zhang. A flexible new technique for camera calibration.Pattern Analysis and Machine Intelligence, IEEE Trans. on,22(11):1330–1334, 2000.

Documents

Decoding, Calibration and Rectiﬁcation for Lenselet-Based ...tion of demosaicing Bayer-pattern plenoptic images – we instead refer the reader to [23] and related work. For the