8
High Frame Rate for 3D Time-of-Flight Cameras by Dynamic Sensor Calibration Mirko Schmidt 1 Klaus Zimmermann 2 Bernd J¨ ahne 1 1 {Mirko.Schmidt, Bernd.Jaehne}@iwr.uni-heidelberg.de Heidelberg Collaboratory for Image Processing University of Heidelberg, Speyerer Str. 6, 69115 Heidelberg, Germany 2 [email protected] Sony Deutschland GmbH Stuttgart Technology Center, Hedelfinger Str. 61, 70327 Stuttgart, Germany Abstract 3D Time-of-Flight cameras are able to deliver robust depth maps of dynamic scenes. The frame rate, however, is limited because today’s systems utilizing two-tap sen- sors need to acquire the required raw images in multiple instances in order to compute one depth map. These mul- tiple raw images allow canceling out systematic errors in- troduced by asymmetries in the two taps, which otherwise would distort the reconstructed depth map. This work presents a method to implicitly calibrate these asymmetries of multi-tap 3D Time-of-Flight sensors. The calibration data are gathered from arbitrary live acquisi- tions possibly in real-time. The proposed correction of raw data supersedes the commonly used averaging technique. Thus it is possible to compute multiple depth maps from a single set of raw images. This increases the frame rate by at least a factor of two. The method is verified using real camera data and is evaluated quantitatively. 1. Introduction 3D Time-of-Flight (ToF) cameras have the potential to efficiently generate depth maps for various applications, for instance in the areas of gaming, robotics, automotive, home security, etc. These cameras are capable to acquire dense depth maps by measuring the time it takes for electromag- netic waves to travel the distance from its source in the cam- era to an object in the scene and back. Besides the depth map, these systems are able to derive additional informa- tion about the scene, which are generally not known (e.g. an intensity map). We will call them scene unknowns. Continuous-wave as well as pulse-based ToF systems have been realized. All ToF systems are based on a convo- lution of the incident optical signal with a temporal window. This can be realized using a standard imaging sensor in con- junction with a physical shutter (e.g. used by 3DVSystems [14]). Another possibility is to employ correlating sensors (lock-in pixel). This approach is used for instance by var- ious companies such as Canesta [3], Mesa Imaging [7] and PMDTechnologies [10]. These manufacturers employ correlating sensors, which are able to acquire two measurements simultaneously (two-tap sensors). A typi- cal scheme is to perform 8 measurements, comprising 4 se- quential acquisitions (of two simultaneous measurements). These 8 measurements are then used to generate a single depth image. This work proposes a method to perform an implicit scene-based calibration of multi-tap correlating ToF sen- sors. The resulting calibration routine allows the compu- tation of additional independent depth images, so the effec- tive frame rate can be increased (from currently 30Hz on average to 60Hz or, using an extension, even 120Hz). The goal of this work is to provide a novel technique to increase the frame rate of ToF systems based on today’s hardware. Please note that this method is not intended to re- place the initial depth calibration routines, inherent to every ToF system to achieve absolute accuracy. Using a specific camera system and a simple implemen- tation of the proposed technique, it will be shown that dou- bling the frame rate of an available ToF system is possible. Thus the feasibility of this approach is shown for the entire class of ToF cameras using correlating multi-tap sensors. 1

High Frame Rate for 3D Time-of-Flight Cameras by Dynamic Sensor CalibrationICCP2011/papers/Paper 24/p24.pdf · High Frame Rate for 3D Time-of-Flight Cameras by Dynamic Sensor Calibration

  • Upload
    doandat

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

High Frame Rate for 3D Time-of-Flight Camerasby Dynamic Sensor Calibration

Mirko Schmidt 1 Klaus Zimmermann 2 Bernd Jahne 1

1 {Mirko.Schmidt, Bernd.Jaehne}@iwr.uni-heidelberg.deHeidelberg Collaboratory for Image Processing

University of Heidelberg, Speyerer Str. 6, 69115 Heidelberg, Germany

2 [email protected]

Sony Deutschland GmbHStuttgart Technology Center, Hedelfinger Str. 61, 70327 Stuttgart, Germany

Abstract

3D Time-of-Flight cameras are able to deliver robustdepth maps of dynamic scenes. The frame rate, however,is limited because today’s systems utilizing two-tap sen-sors need to acquire the required raw images in multipleinstances in order to compute one depth map. These mul-tiple raw images allow canceling out systematic errors in-troduced by asymmetries in the two taps, which otherwisewould distort the reconstructed depth map.

This work presents a method to implicitly calibrate theseasymmetries of multi-tap 3D Time-of-Flight sensors. Thecalibration data are gathered from arbitrary live acquisi-tions possibly in real-time. The proposed correction of rawdata supersedes the commonly used averaging technique.Thus it is possible to compute multiple depth maps from asingle set of raw images. This increases the frame rate byat least a factor of two. The method is verified using realcamera data and is evaluated quantitatively.

1. Introduction3D Time-of-Flight (ToF) cameras have the potential to

efficiently generate depth maps for various applications, forinstance in the areas of gaming, robotics, automotive, homesecurity, etc. These cameras are capable to acquire densedepth maps by measuring the time it takes for electromag-netic waves to travel the distance from its source in the cam-era to an object in the scene and back. Besides the depthmap, these systems are able to derive additional informa-tion about the scene, which are generally not known (e.g.an intensity map). We will call them scene unknowns.

Continuous-wave as well as pulse-based ToF systemshave been realized. All ToF systems are based on a convo-lution of the incident optical signal with a temporal window.This can be realized using a standard imaging sensor in con-junction with a physical shutter (e.g. used by 3DVSystems[14]).

Another possibility is to employ correlating sensors(lock-in pixel). This approach is used for instance by var-ious companies such as Canesta [3], Mesa Imaging[7] and PMDTechnologies [10]. These manufacturersemploy correlating sensors, which are able to acquire twomeasurements simultaneously (two-tap sensors). A typi-cal scheme is to perform 8 measurements, comprising 4 se-quential acquisitions (of two simultaneous measurements).These 8 measurements are then used to generate a singledepth image.

This work proposes a method to perform an implicitscene-based calibration of multi-tap correlating ToF sen-sors. The resulting calibration routine allows the compu-tation of additional independent depth images, so the effec-tive frame rate can be increased (from currently 30Hz onaverage to 60Hz or, using an extension, even 120Hz).

The goal of this work is to provide a novel techniqueto increase the frame rate of ToF systems based on today’shardware. Please note that this method is not intended to re-place the initial depth calibration routines, inherent to everyToF system to achieve absolute accuracy.

Using a specific camera system and a simple implemen-tation of the proposed technique, it will be shown that dou-bling the frame rate of an available ToF system is possible.Thus the feasibility of this approach is shown for the entireclass of ToF cameras using correlating multi-tap sensors.

1

Since this work is intended to stimulate the design of newToF systems, limitations given by our proof of concept im-plementation (for instance necessity of a deactivated SBI)do not restrict the applicability of the concept itself.

Starting with an explanation of the background in Sec.2, we outline our approach of an implicit calibration in Sec.3. In Sec. 4 we explain the collection of suitable data fromarbitrary raw data streams, and the construction of correc-tion operators. The approach of rectifying the raw data forincreasing the frame rate of the ToF system is explained inSec. 5. In Sec. 6 we present experimental results. Sec. 7provides conclusion and outlook.

1.1. Related Work

Much work performed in the field of calibration of Time-of-Flight cameras relates to the compensation of the devia-tions of distance or intensity measurements. For instanceKahlmann, Remondino, and Ingensand [4], Lindner andKolb [6] and Rapp [9] presented methods to decrease sys-tematic deviations of the estimated scene unknowns.

Work dealing with the raw data of ToF systems, aimingto understand errors of the estimated scene unknowns waspublished by Erz and Jahne [1], or Schmidt and Jahne [11].

To the authors knowledge, no calibration- or software-based approaches aiming to increase the frame rate of ToFsystems have been published yet.

2. Background2.1. Depth Estimation Using Correlating Sensors

Continuous-wave ToF systems use a periodically mod-ulated light source to illuminate an observed scene. Dueto the traveling time of the light, the phase of the backscat-tered light signal is delayed. This phase shift is measured bythe ToF system to estimate the depth. The measurement isdone by using a correlating sensor, which is able to samplethe correlation function of an incident optical signal withan electronic reference signal. As derived by Lange [5], as-suming a sinusoidal modulation of the light source and arectangular window function, the correlation function is

I(Θ) = a0 + a1 · cos(Θ + ϕ) , (1)

where a0 and a1 are the offset and amplitude of the electro-optical signal, and ϕ is the phase shift between both signals.

The depth z of the imaged object may be estimated fromϕ as

z =ϕ · c

4 · π · ν, (2)

with c being the speed of light and ν being the modulationfrequency of the light source.

To determine the three unknowns a0, a1 and ϕ, at leastthree samples of the correlation function are necessary.

Typically, N equidistant sampling points located at thephase angles Θn = n · 2π/N are used to reconstruct thescene unknowns. As shown by Frank et al. [2] and Plaue[8], the optimal solution in a least square sense is in thiscase given by:

a0 =2

N

N−1∑n=0

In , (3)

a1 =2π

N

∣∣∣∣∣N−1∑n=0

Ine−i2π(n/N)

∣∣∣∣∣ , (4)

ϕ = arg

(N−1∑n=0

Ine−i2π(n/N)

), (5)

with In =I(Θn)

qT,

with T being the oscillating period of the light signal and qbeing the number of integrated oscillating periods (and thusq T being the exposure time).

2.2. Multi-Tap ToF Sensors

The majority of the available ToF sensors are capable toacquire multiple samples of the correlation function simul-taneously; typically they make use of two taps. The firstmulti-tap systems were proposed by Schwarte et al. [12],and Spirig, Seitz, and Heitger [13].

In a general formulation, each pixel of the sensor has Mdetection units (taps) which parallelly acquire measurementvalues. Each of these detection units may be driven in Ndifferent measurement modes and each one of these modesaims to measure a specific sample of the correlation func-tion. The theoretical value to be measured by a particulardetection unit (m,with m ∈ {1, . . . ,M}) in a specific mea-surement mode (n,with n ∈ {1, . . . , N}) will be denotedas cn,m. The result of this measurement is a digital valuewhich will be denoted as dn,m. Typically N > M is valid,thus to acquire the necessary N samples of the correlationfunction, multiple measurements are necessary. We denotethe bundle of raw values acquired by a single pixel and usedfor computation of a single depth value as a raw data pack-age. A typical raw data package looks as depicted in Fig.1. The values cn,m to be measured correspond to samples

Figure 1. A typical raw data package for M = 2 taps and N = 4measurements of the correlation function.

of the correlation function (1) I(Θn). The acquired valuesdn,m would ideally be the samples In used in (3) - (5) forreconstructing the scene unknowns. Unfortunately it is notpossible to use these values directly, because the measure-ment process introduces errors which have to be compen-sated by an adequate processing.

2.3. Erroneous Measurement Process

Each tap has an individual characteristic curve [1]. Thischaracteristic curve will be modeled as transformation Γ(6). Ideally, Γ is a linear function and identical for all tapsm and sampling modes n. However, due to imperfect fab-rication processes, Γn,m differs for each sampling mode n(n ∈ {1, . . . , N}) and tap m (m ∈ {1, . . . ,M}).

dn,m = Γn,m(cn,m), n ∈ {1, . . . , N},m ∈ {1, . . . ,M} (6)

Please note that the characteristic curves Γn,m are also dif-ferent for each pixel. For simplicity, the following argumen-tation will focus on a single pixel. In an implementation, themethod derived here is applied to all pixels of the sensor inthe same way.

2.4. State-of-the-Art: Averaging

A possible strategy to compensate errors introduced bythe different characteristic curves is to perform an averagingover all taps, i.e. each sample of the correlation function ismeasured by each of theM detection units individually, andall these values are averaged arithmetically. 1

For instance, the CamCube 2.0 ToF System byPMDTechnologies uses a sensor with M = 2 taps and ac-quires N = 4 samples of the correlation function. A rawdata package consists of 8 values, of which half is acquiredwith tap 1, and the other half with tap 2 (cf. Fig. 1).

The acquired values dn,m are used to compute the sam-ples In by (7) - (10), which are utilized in (3) - (5) for re-constructing the scene unknowns.

I0 = (d1,1 + d1,2)/2 (7)I1 = (d2,1 + d2,2)/2 (8)I2 = (d3,1 + d3,2)/2 (9)I3 = (d4,1 + d4,2)/2 (10)

This strategy has the effect that differences of the var-ious characteristic curves Γ cancel out. However, this isonly valid for differences of linear order. Furthermore, anyimplementation of this strategy will be slow since each sam-ple of the correlation function n is acquired multiple times(namely by each tap M ) to generate a single set of scene

1This particular averaging strategy is used by PMDTechnologies toour knowledge. The methods of other ToF manufacturers are not known tous.

unknowns. Moreover, higher order deviations of the differ-ent characteristic curves Γ are not compensated using thisapproach.

2.5. Explicit vs. Implicit Calibration

A possibility to supersede this averaging technique is todetermine Γn,m of the ToF system by performing a photo-metric calibration (see e.g. Erz and Jahne [1]). Such ap-proach explicitly determines each Γn,m by illuminating thesensor with a well defined input and by analyzing the (rawdata) output of the ToF system. However, such an explicitcalibration requires a special homogeneous light source,e.g. an integrating sphere. Furthermore, this explicit cali-bration is slow and thus expensive in a production line. Themost critical issue is, however, that Γn,m are typically notstable over time, because they depend on a variety of factorsespecially on the temperature.

Instead of such an explicit calibration, the method pro-posed here aims at performing an implicit calibration wherethe differences between two read-out paths Γ are estimatedand compensated.

3. Implicit CalibrationOur approach performs an implicit calibration of the sen-

sor inhomogeneities from arbitrary raw data acquired froma scene. It uses a rectification operator rn,m (specified inthe following section), which is applied to correct the sen-sor raw data {dn,m} (11).

d′n,m = rn,m(dn,m) = rn,m(Γn,m(cn,m)) (11)

Please note that Γn,m and cn,m are unknowns, which arenot determined by the calibration process.

The goal of the rectification process is to generate a setof corrected raw data

{d′n,m

}such that each corrected out-

put value d′n,m only depends on the theoretical input valuecn,m, and is no longer depending on the detection unit mor sampling mode n used for the measurement. Thus therequirement for rn,m is:

cn1,m1 = cn2,m2 ⇒ d′n1,m1 = d′n2,m2 , (12)for all n1, n2 ∈ {1, . . . , N} ,and m1,m2 ∈ {1, . . . ,M} .

Since a relative calibration is desired the data of onlyM −1 taps have to be rectified. W. l. o. g. we choosem = 1as the tap of which the data are trivially corrected, i.e. re-main uncorrected. The raw data of all other taps are cor-rected for each possible sampling mode n, see (13).

This means that there are (M − 1) ·N independent non-trivial and N trivial rectification operators rn,m for eachpixel. The rectification operators are used to compensate

rn,m(dn,m) =

dn,m , if m = 1

rn,m(Γn,m(cn,m)) = rn,1(Γn,1(cn,1)) = d′n,1 , if m 6= 1 ,for each possible cn,1 , and cn,m = cn,1 ,n ∈ {1, . . . , N}

(13)

deviations caused by the different detection units m indi-vidually for each sampling mode n. Please note that this isonly an implicit definition of rn,m. It will be shown in thenext section how rn,m is constructed.

4. Scene-based Sensor Calibration

The rectification operators rn,m can be constructed byanalyzing raw data delivered by a ToF system. Under theassumption that the observed scene is (temporarily) notchanging, each tap (of a pixel) measures the same theoreti-cal input, thus:

cn,m = cn,1, n ∈ {1, . . . , N} (14)

Due to aforementioned different characteristic curves, thesensor output measured by different taps is usually not iden-tical: dn,m 6= dn,1. The rectification operator rn,m isgenerated in such a way that (15) is valid for each pair(dn,m, dn,1).

rn,m(dn,m) = dn,1 (15)

The rectification operator rn,m expresses the correlation ofactually measured data (dn,m) and the data which wouldhave been measured with tap m = 1 (dn,1). For an idealsensor, rn,m would be the identity function.

The generation of rn,m can be done by collecting multi-ple pairs {(dn,m, dn,1)i} and fitting a polynomial functionto this data set. The rectification operator rn,m is then thepolynomial function. It has to be computed individually forall taps m 6= 1, all sampling modes n, and all pixels. Pleasesee Fig. 2 for a visualization of proposed calibration tech-nique.

The assumption of a static scene does not need to befulfilled for all pixels simultaneously. Instead, static sub-sequences of the raw data signal can be found for everypixel individually and might be used for generation of rn,m.Such static subsequences are typically present in all kind ofnatural sequences. They can be identified for instance bycomparing the absolute temporal gradient of the raw datasignal with a predefined threshold: If the absolute gradi-ent of the raw data signal of a particular pixel is below thisthreshold, the pixel corresponds to a static object, so pairsof {(dn,m, dn,1)i} can be extracted from the acquired rawdata package.

ToF sensor

N sampling modes

M taps(no correction of tap 1)

N(M-1) nontrivial corrections

construction of r

Figure 2. Overview calibration: The rectification operator rn,m isa polynomial fit of {dn,1} over {dn,m} depicted here for m =2, n = 1. For every pixel, N(M − 1) nontrivial rectificationoperators have to be computed.

5. Raw Data RectificationThe rectification operators rn,m may be used to com-

pensate the effect of the different characteristic curves ofthe different taps. So, the averaging technique described inSec. 2.4 is not needed anymore. Thus, each raw data pack-age may be split into separate packages, which may be usedto compute individual sets of scene unknowns. For instanceeach raw data package of a two-tap camera using N = 4samples may be split into two subpackages carrying the fullinformation necessary for reconstruction of the scene un-knowns (see Fig. 3).

By pursuing this strategy multiple sets of scene un-knowns can be computed from a single raw data package,and hence the frame rate is increased. Please note that sinceeach subpackage carries the full information to compute thescene unknowns, these computed quantities are indepen-dent. For the given example the frame rate of the depthmaps and all other computed scene unknowns is doubled.

5.1. Extension: Using Interleaved Datasets

A further increase of the frame rate is possible by us-ing interleaved subpackages. This requires data of the rawdata package acquired prior to the considered one. Let’sdenote these data with an additional index p. The raw datapackage may be split into four interleaved subpackages (see

Figure 3. Splitting a raw data package into two independent sub-packages.

Fig. 4). This enables the computation of four sets of sceneunknowns for each raw data package, corresponding to in-creasing the frame rate by factor four.

Please note that using interleaved subpackages does notproduce the same values compared to interpolating thescene unknowns generated from independent (i.e. not in-terleaved) subpackages. In other words using interleavedsubpackages does not correspond to applying a simple in-terpolation. The reason is that the reconstruction of sceneunknowns from raw data is typically performed by nonlin-ear operations (see (4), (5)).

Please note that here the computed quantities are not in-dependent, since each subpackage has 50% overlap with itsadjacent subpackage.

Figure 4. Splitting a raw data package into four interleaved sub-packages.

5.2. Frame Rate Increase

Using the proposed raw data rectification enables split-ting the raw data packages into subpackages, which enablesa significant frame rate increase. The averaging techniquedescribed in Sec. 2.4 is capable to compute one set of sceneunknowns for each set of N measurements (length of a rawdata package). By applying proposed raw data rectifica-tion, each raw data package can be split into subpackagesof length N/M , of which each can be used to compute anindependent set of scene unknowns. Thus the frame rate in-crease is N/(N/M) = M , which corresponds to the num-

ber of taps used.By using interleaved subpackages, a set of scene un-

knowns can be computed every time a new measurementis done (length of new data: 1). Thus from N subpack-ages of the sequence, N sets of scene unknowns may com-puted, giving a frame rate increase of N/1 = N , which isthe number of samples. It has to be noted that the samespeedup would be feasible using an adapted averaging tech-nique with a “sliding window”. However, using data rec-tified by the proposed method significantly decreases theoverlap of the used subpackages and thus decreases the de-pendency of the generated sets of scene unknowns. Sub-packages constructed from rectified data have an overlap of(N/M)/N = 1/M , compared to an overlap of (N − 1)/Nwhen using the averaging technique. For example, rectifiedinterleaved subpackages of a two-tap sensor using N = 4samples would have 50% overlap, compared to 75% overlapwhen using interleaved data in combination with an averag-ing technique.

6. Experimental Results

For the experimental verification using real data we em-ployed a PMD CamCube 2.0 camera (PMDTechnologies,Siegen, Germany). This ToF camera uses a correlating sen-sor with two taps and thus represents a considerable class ofcommercially available 3D ToF systems2.

We acquired a sequence of 250 raw data packages (perpixel), which included four static subsequences. For thegathering of calibration data covering a big fraction of theavailable raw data range, we presented nearly homogeneous“targets” at various distances and of various reflectivities tothe camera. The objects serving as targets were casuallychosen and positioned, since the quality of the targets wasnot important, but rather the fact that the input was (tem-porarily) static and of various intensities. We used a card-board (distance z = 1m), the lab’s carpet (z = 2m), thedoor (z = 4m) and a piece of paper (z = 0.5m).

The PMD camera has a system for active compensa-tion of background light (called SBI) built in, which intro-duces a highly non-linear feature to the characteristic curveΓ (please see [11] for details). For the proof of concept,correctly dealing with this highly specific system propertydoes not provide any benefits. We decided therefore to keepthe algorithms simple and to acquired data without activa-tion of the SBI. Since the SBI is activated automatically athigh intensities, the absence of strong light sources ensuredthat the SBI was deactivated.

The sequence was processed offline using MATLABscripts. Static subsequences were searched individually foreach pixel. This was done by accepting all samples whose

2Among others, this class includes also cameras from Canesta andMesa Imaging.

squared temporal gradient was below a threshold ξ:

accept dn,m,t1 , if (dn,m,t1 − dn,m,t0)2< ξ (16)

With dn,m,t0 and dn,m,t1 being two consecutive values ac-quired at time steps t0 and t1 (t0 < t1) of a certain rawchannel and pixel. For our experiments, we have chosenξ = 4000DN2 3. From these static subsequences, on aver-age 191.3 pairs of (dn,1, dn,m)i per pixel and raw channelwere collected.

From previous investigations [1, 11] we knew that a typi-cal characteristic curve of the camera system at hand is wellapproximated by a linear function. Therefore a linear func-tion (polynomial of degree 1) is well suited to model alsothe difference of two different characteristic curves.

Thus, for each pixel, each sampling mode n, and m = 2,a linear function (17) was fit to {(dn,1)i} over {(dn,m)i}using a least square fit, giving rn,m (18).

dn,1 = pn,m + qn,m · dn,m ,m = 2, n ∈ {1, . . . , N} (17)

rn,2(dn,m) = pn,m + qn,m · dn,m (18)

Here, pn,m is the offset and qn,m the slope of the rectifica-tion operator rn,m.

The process of generating rn,m is visualized in Fig. 5 form = 2, n = 1 and a single representative pixel with coor-dinates x = 100, y = 80. The blue crosses represent allpairs {(d1,1, d1,2)i} present in the input sequence. All thepairs belonging to static subsequences (identified by apply-ing (16)), were used to compute r1,2 and are labeled withred circles in the figure. The computed correction operatorr1,2 is visualized as green solid line.

Fig. 5 suggests that these samples are clustered and notevenly distributed over the input range, which might resultin a bad numerical fit. However, these clusters correspondto the static subsequences, each representing a single targetof the test sequence. Therefore the clustered characteristicsof the data is a result of the limited extent of the acquiredsequence and does not allow any conclusions about the pre-sented method.

The correction was applied to a single frame showing arotating depth target: For each pixel, the raw data packagewas split into two subpackages (cf. Fig. 3). All raw datameasured with tap m = 2 were being corrected, while all

3[DN] = Digital Number (physical unit of the sensor raw data)

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

x 104

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7x 10

4

d1,2

d1

,1

Estimation of r2,1 for one pixel (x = 100, y = 80)

Figure 5. Generation of r1,2 for a single pixel with spatial coordi-nates x = 100, y = 80.

data acquired with the first tap were trivially corrected:

d′1,1 = r1,1(d1,1) = d1,1 (19)...

d′4,1 = r4,1(d4,1) = d4,1 (20)d′1,2 = r1,2(d1,2) = p1,2 + q1,2 · d1,2 (21)

...d′4,2 = r4,2(d4,2) = p4,2 + q4,2 · d4,2 (22)

From these corrected data, two single phase maps ϕ1 andϕ2 were computed by applying (5) on the data of each sub-package:

ϕ1 = arctan[(d′4,2 − d′2,1)/(d′3,2 − d′1,1)] (23)ϕ2 = arctan[(d′4,1 − d′2,2)/(d′3,1 − d′1,2)] (24)

For comparison, also (uncorrected single) phase maps usinguncorrected data of the subframes were computed as (25)and (26). Furthermore, an (averaged) phase map using theaveraging technique described in Sec. 2.4 was generated byapplying (7) - (10) and (27).

ϕ1,uc = arctan[(d4,2 − d2,1)/(d3,2 − d1,1)] (25)ϕ2,uc = arctan[(d4,1 − d2,2)/(d3,1 − d1,2)] (26)ϕavg = arctan[(I3 − I1)/(I2 − I0)] (27)

From these phase maps depth maps were computed using(2), which are shown in Fig. 6.

6.1. Evaluation

The objective of this work is to show that a dynamic sen-sor calibration can be used to compensate for the inhomo-geneities of the different taps in multi-tap ToF sensors, en-abling an increased frame rate.

From Fig. 6 it can be seen that depth maps generatedfrom uncorrected subpackages are heavily distorted (b, c).In contrast, the two depth maps generated from rectifiedsubpackages (d, e) look very similar. By comparing thetwo separate depth maps, the motion of the rotating targetmay be recognized (counter clockwise). The comparison of(b, c) and (d, e) indicates that computing two independentdepth maps from split raw data packages as proposed givesmuch better results, if the presented raw data rectification isused.

A quantitative analysis of these results is challenging.Please note that the presented method is working with cam-era raw data, delivered by an uncalibrated ToF camera. Itis not meaningful to evaluate the absolute accuracy of thecomputed single depth maps, because also the absolute ac-curacy of the averaged depth map is unknown. Fig. 6 showsslight deviations in the averaged depth map (a) compared tothe single depth maps (d, e). However, without the groundtruth of the dynamic scene and without an absolute raw datacalibration of the ToF system (including temporal sensor ef-fects), an evaluation of the absolute accuracy is not possible.

6.1.1 Consistency

We are interested in increasing the frame rate, i.e. produc-ing multiple consistent depth maps per frame. To measurethis consistency, we analyzed a distance σd of correspond-ing regions of two computed single depth maps, which wedefined as follows:

∆z(x, y) = z1(x, y)− z2(x, y) , (28)

σd =

√√√√ 1

K

∑(x,y)∈A

(∆z(x, y)− µ∆z)2 , (29)

where z1(x, y) and z2(x, y) are the depth values of the twoanalyzed depth maps at position (x, y), µ∆z is the arith-metic mean of ∆z over the regarded area A, and K is thenumber of pixels inside this area.

As region A for the analysis we chose a small static areaof the scene (x = 40 . . . 60, y = 40 . . . 60). We computedthe distance of the single depth maps generated from thecorrected raw data as σd = 0.1053m. The distance of theuncorrected single depth maps was computed as σd,uc =2.8068m. Thus the single depth maps computed from splitraw data packages show a significantly higher consistency,if proposed raw data rectification is applied.

Please note that motion artifacts (visible at the bordersof the rotating target) are significantly removed in the singledepth maps compared to the averaged depth map, since thedata for computing each depth map were gathered in lesstime.

6.1.2 Temporal Noise

In a second evaluation the temporal noise was analyzed. Forthis, the standard deviation of the depth value of all pixels ofthe same area A was computed over 25 consecutive frames.The mean error of the depth map computed using the aver-aging technique is σt,avg = (0.0487± 0.0080)m.

The average error for a depth value computed from cor-rected data is σt = (0.0699 ± 0.0122)m. This coincidesnicely with the expected increase by a factor of

√2, result-

ing from the fact that the depth maps are constructed usingroughly half of the light.

a.

b. c.

d. e.

Figure 6. Depth maps of rotating target. a Depth map using av-eraged raw data, computed from ϕavg (state-of-the-art). b, c Twodepth maps generated from subpackages without correction (fromϕ1,uc, ϕ2,uc). d, e Two depth maps generated from subpackagescorrected using proposed method (from ϕ1, ϕ2). (Please see elec-tronic version for colors.)

6.2. Computational Performance

Our experiments were performed using a MATLAB im-plementation, which was computationally not optimized.The program runtime for loading the complete sequence,generating the rectification operators rn,m and rectifyingthe data of a frame was about 50 seconds on a standardnotebook PC. Since the generation of the rectification oper-

ators may be implemented recursively and applying them isvery simple, the complete algorithm may be implementedcomputationally very efficiently. A realtime implementa-tion of the proposed method is hence feasible, even on sys-tems with limited hardware resources.

7. Conclusion and OutlookThis work provides a proof of concept for performing

an implicit dynamic calibration of the characteristic curvesof the different taps of a ToF sensor. It has been shownthat the derived raw data rectification can be used for boost-ing the frame rate of ToF systems. Our experimental re-sults show that doubling the frame rate of a commercialtwo-tap ToF system is definitely feasible. The generatedsingle depth maps are consistent and their statistical uncer-tainty increases as expected. By utilizing interleaved sub-packages, a frame rate increase by a factor of four for thesame system is possible.

The state-of-the-art averaging technique described inSec. 2.4 make the difference of the various characteristiccurves Γ cancel out. However this is only valid, if these dif-ferences are described by a linear function. Our approachis able to handle higher order deviations by employing ahigher order polynomial as rectification operator. Thus itis suitable to deliver data of higher accuracy compared tostate-of-the-art solutions.

In opposite to the averaging technique our approach usesthe acquisition of raw data performed in less time. As aresult motion artifacts are significantly reduced.

The demonstrated method makes use of the static subse-quences of a given raw data sequence. For applications inwhich such static subsequences do not occur (e.g. automo-tive), the generation of the rectification operators could behandled by temporarily interpolating the sensor raw data.

As mentioned before, the characteristic curve of each de-tection unit Γn,m may vary over time. It is therefore benefi-cial to implement routines allowing the temporal adaptationof the generated rectification operators.

Current ToF systems acquire raw data package in a burstmode fashion. For optimally exploiting our proposed tech-nique of enhancing the frame rate, it is advisable to adjustthe temporal sampling of these acquisitions, such that thegenerated subpackages correspond to an equitemporal sam-pling of the scene.

8. AcknowledgementsThis research was supported by Sony Deutschland

GmbH.

References[1] M. Erz and B. Jahne. Radiometric and Spectrometric Cali-

brations, and Distance Noise Measurement of TOF Cameras.

In R. Koch and A. Kolb, editors, 3rd Workshop on Dynamic3-D Imaging, volume 5742 of Lecture Notes in ComputerScience, pages 28–41. Springer, 2009.

[2] M. Frank, M. Plaue, H. Rapp, U. Kthe, B. Jahne, and F. A.Hamprecht. Theoretical and experimental error analysis ofcontinuous-wave time-of-flight range cameras. Optical Eng.,48:013602, 2009.

[3] S. B. Gokturk, H. Yalcin, and C. Bamji. A Time-Of-Flight depth sensor - System description, issues and solu-tions. http://www.canesta.com/assets/pdf/technicalpapers/CVPR Submission TOF.pdf.

[4] T. Kahlmann, F. Remondino, and H. Ingensand. Calibrationfor increased accuracy of the range imaging camera swiss-ranger. International Archives of Photogrammetry, RemoteSensing and Spatial Information Sciences, XXXVI(5):136–141, 2006.

[5] R. Lange. 3D Time-of-Flight Distance Measurementwith Custom Solid-State Image Sensors in CMOS/CCD-Technology. PhD thesis, Department of Electrical Engineer-ing and Computer Science at University of Siegen, 2000.

[6] M. Lindner and A. Kolb. Lateral and depth calibration ofpmd-distance sensors. In International Symposium on VisualComputing (ISVC06), volume 2, pages 524–533. Springer,2006.

[7] T. Oggier, M. Lehmann, R. Kaufmann, M. Schweizer,M. Richter, P. Metzler, G. Lang, F. Lustenberger, andN. Blanc. An all-solid-state optical range camera for 3D real-time imaging with sub-centimeter depth resolution. 2004.

[8] M. Plaue. Analysis of the pmd imaging system. Techni-cal report, Interdisciplinary Center for Scientific Computing,University of Heidelberg, 2006.

[9] H. Rapp. Experimental and Theoretical Investigation of Cor-relating TOF-Camera Systems. Diplomarbeit, IWR, Fakultatfur Physik und Astronomie, Universitat Heidelberg, 2007.

[10] T. Ringbeck and B. Hagebeuker. A 3D Time of Flight camerafor object detection. http://www.ifm.com/obj/O1D Paper-PMD.pdf, 2007.

[11] M. Schmidt and B. Jahne. A Physical Model of Time-of-Flight 3D Imaging Systems, Including Suppression of Am-bient Light. In R. Koch and A. Kolb, editors, 3rd Workshopon Dynamic 3-D Imaging, volume 5742 of Lecture Notes inComputer Science, pages 1–15. Springer, 2009.

[12] R. Schwarte, H. G. Heinol, Z. Xu, and K. Hartmann. Newactive 3D vision system based on rf-modulation interferom-etry of incoherent light. In D. P. Casasent, editor, Societyof Photo-Optical Instrumentation Engineers (SPIE) Confer-ence Series, volume 2588 of Society of Photo-Optical Instru-mentation Engineers (SPIE) Conference Series, pages 126–134, Oct. 1995.

[13] T. Spirig, P. Seitz, and F. Heitger. The lock-in CCD.two-dimensional synchronous detection of light. IEEEJ.Quantum Electronics, 31:1705–1708, 1995.

[14] G. Yahav, G. J. Iddan, and D. Mandelboum.3D Imaging Camera for Gaming Application.http://www.3dvsystems.com/technology/3D%20Camera%20for%20Gaming-1.pdf, 2006.