9
22 nd Computer Vision Winter Workshop Nicole M. Artner, Ines Janusch, Walter G. Kropatsch (eds.) Retz, Austria, February 6–8, 2017 Towards Dense SLAM with High Dynamic Range Colors Sergey V. Alexandrov, Johann Prankl, Michael Zillich and Markus Vincze TU Wien Vision4Robotics Group, ACIN {alexandrov, prankl, zillich, vincze}@acin.tuwien.ac.at Abstract. Modern dense visual SLAM systems pro- duce high-fidelity geometric models, yet the qual- ity of their textures lags behind. In part, the prob- lem pertains to naive handling of colors. The RGB triplets from images are averaged straight into the model, ignoring nonlinearity of the image color space, vignetting effects, and variations of exposure time between different frames. In this paper we propose extensions to a surfel- based dense SLAM framework that enable more faithful capture of scene appearance. We adjust the representation to increase the dynamic range of col- ors, radiometrically rectify images to work in linear color space, and explicitly handle saturated pixels. Differently to the prior work in HDR-aware SLAM, we advocate turning off the automatic exposure func- tion of the camera and incorporate a custom con- troller in the SLAM loop. We demonstrate improve- ments in texture quality compared to LDR systems, and show that self-directed exposure time control yields more complete and consistent color models. 1. Introduction Visual SLAM (Simultaneous Localization and Mapping) has been a major research topic in robotics for decades [2]. The advent of cheap RGB-D cam- eras fueled interest in the family of methods that per- form dense reconstruction of the underlying geom- etry. Newcombe et al. [14] introduced the idea of tracking a camera against the growing surface model using direct geometric alignment of all input data. In their system the map was represented by a flat voxel grid of limited spatial resolution and size. Follow- up works removed this restriction [3, 15], added loop closure detection [19], and online global optimiza- tion of the pose graph [4]. Other map represen- tations, such as surfel-based [8, 20] and keyframe- Figure 1. High Dynamic Range Imaging. The scene has light and dark objects; it is not possible to properly expose both simultaneously. Top row: camera images taken with small and large shutter times. Bottom: composite high dynamic range image (tonemapped). based [13], were explored as well. Traditionally, the focus has been on obtaining globally consistent, high-fidelity geometric recon- structions. The color appearance is often neglected; dense SLAM systems either ignore input color data, or fix the sensor exposure time and adopt a simplistic method of averaging image pixel intensities. There are several problems associated with the latter approach. Firstly, it implies that image pixel intensities directly reflect the apparent color of the scene points. This assumption does not hold due to nonlinear transformations in the process of image

Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

22nd Computer Vision Winter WorkshopNicole M. Artner, Ines Janusch, Walter G. Kropatsch (eds.)Retz, Austria, February 6–8, 2017

Towards Dense SLAM with High Dynamic Range Colors

Sergey V. Alexandrov, Johann Prankl, Michael Zillich and Markus VinczeTU Wien

Vision4Robotics Group, ACINalexandrov, prankl, zillich, [email protected]

Abstract. Modern dense visual SLAM systems pro-duce high-fidelity geometric models, yet the qual-ity of their textures lags behind. In part, the prob-lem pertains to naive handling of colors. The RGBtriplets from images are averaged straight into themodel, ignoring nonlinearity of the image colorspace, vignetting effects, and variations of exposuretime between different frames.

In this paper we propose extensions to a surfel-based dense SLAM framework that enable morefaithful capture of scene appearance. We adjust therepresentation to increase the dynamic range of col-ors, radiometrically rectify images to work in linearcolor space, and explicitly handle saturated pixels.Differently to the prior work in HDR-aware SLAM,we advocate turning off the automatic exposure func-tion of the camera and incorporate a custom con-troller in the SLAM loop. We demonstrate improve-ments in texture quality compared to LDR systems,and show that self-directed exposure time controlyields more complete and consistent color models.

1. Introduction

Visual SLAM (Simultaneous Localization andMapping) has been a major research topic in roboticsfor decades [2]. The advent of cheap RGB-D cam-eras fueled interest in the family of methods that per-form dense reconstruction of the underlying geom-etry. Newcombe et al. [14] introduced the idea oftracking a camera against the growing surface modelusing direct geometric alignment of all input data. Intheir system the map was represented by a flat voxelgrid of limited spatial resolution and size. Follow-up works removed this restriction [3,15], added loopclosure detection [19], and online global optimiza-tion of the pose graph [4]. Other map represen-tations, such as surfel-based [8, 20] and keyframe-

Figure 1. High Dynamic Range Imaging. The scene haslight and dark objects; it is not possible to properly exposeboth simultaneously. Top row: camera images taken withsmall and large shutter times. Bottom: composite highdynamic range image (tonemapped).

based [13], were explored as well.Traditionally, the focus has been on obtaining

globally consistent, high-fidelity geometric recon-structions. The color appearance is often neglected;dense SLAM systems either ignore input color data,or fix the sensor exposure time and adopt a simplisticmethod of averaging image pixel intensities.

There are several problems associated with thelatter approach. Firstly, it implies that image pixelintensities directly reflect the apparent color of thescene points. This assumption does not hold dueto nonlinear transformations in the process of image

Page 2: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

formation. In our recent work we demonstrated howappropriate camera calibration allows to rectify inputimages and resolve this issue [1]. The second majorproblem has to do with the limited dynamic rangeof the camera sensors. Only a small range of illu-mination intensities found in the real world can becaptured and represented using conventional 24-bitRGB colors. Intensities outside this range result inunder- and overexposed pixels that convey no infor-mation about the apparent color of the scene point.Thus, the details in shadows and highlights are lost.

The problem of low dynamic range (LDR) of thecamera sensors is central for the photography com-munity. Typically, it is addressed by selecting a shut-ter time that allows to properly expose the areas of in-terest in the scene. However, this fails when a scenehas a lot of inherent contrast (see Figure 1). To dealwith such situations the high dynamic range (HDR)imaging method was developed [18]. It involves tak-ing multiple LDR images at different exposure times,making sure that all areas of the scene are properlyexposed in at least one image. Then the images areconverted into a linear color space and combined intoa radiance map of extended dynamic range.

Recently, two works were presented that imple-ment a dense SLAM pipeline and recover scene col-ors in HDR [11, 12]. Both rely on the automatic ex-posure (AE) function of the camera to obtain differ-ently exposed images of the scene. This has two im-plications: exposure times have to be estimated atper-frame basis, thus incurring computational effortand drift in the long run. Further, AE is shortsightedin that it has no global awareness and is designed tooptimize exposure time for the current frame. Con-sequently, parts of the scene may remain always sat-urated and thus without valid color.

In this paper we address the problem of captur-ing and representing the colors in a scene in highdynamic range. Equipped with a consumer-gradeRGB-D camera, we strive to obtain a 3D model withaccurate colors, without losing any detail in high-lights and shadows. In contrast to the prior work, wedisable the built-in AE function and design our owncontroller. The benefit is two-fold: exposure timeof every image is known and does not need to beestimated. Secondly, we leverage the reconstructedmodel to make educated decisions regarding whichexposure time should be used next to gain new colorinformation. To the best of our knowledge, this isthe first implementation of a custom exposure time

controller and the first work to extend a surfel-baseddense SLAM framework to handle colors in high dy-namic range.

2. Preliminaries and related work

2.1. Radiometric image formation

Image formation is a complex process that in-volves nonlinear transformations. Scene points emitrays in the direction of the camera. The camera shut-ter is opened for a certain period of time to allow thelight to pass through the lens system. The energy re-ceived on the image plane is then converted into anelectrical signal and quantized into pixel intensities.

For a scene point x ∈ R3 the intensity of the corre-sponding pixel u in the image space domain Ω ⊂ N2

is given by:

I(u) = f (tV (u)L(x)) , (1)

where f : R → 0, . . . , 255 is the radiometric re-sponse function of the camera, t is the exposure time,V : Ω → R is the optical response of the lens sys-tem (vignetting), and L : R3 → R defines mappingbetween scene points and their radiances.

The function f(·) maps energy received at a pixelwell into a quantized 8-bit intensity value. Due to thelimitations in the sensor technology, energies outsidea certain valid range are mapped to either minimum(0) or maximum (255) intensity values. Such pixelsare said to be under- or overexposed and provide onlyan upper or a lower bound of the received energy.

Vignetting response V (·) maps image locations toattenuation factors. It is often assumed to be radiallysymmetric and is modeled with a polynomial func-tion [9]. However, recently it was demonstrated thatthe vignetting response in consumer RGB-D camerasis better modeled with a nonparametric per-pixel mapof attenuation factors [1].

Both radiometric and vignetting responses can beprecalibrated [1, 6]. Figure 2 shows recovered re-sponses for the blue channel of an Asus Xtion LivePro camera. The responses in other color channelsare similar, but not identical.

2.2. HDR capture

The goal of HDR capture is to recover a radianceimage of a scene in its full dynamic range. Basedon (1), pixels in I can be converted into a radianceimage L:

L(u) =f−1 (I(u))

tV (u). (2)

Page 3: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

0.0 0.5 1.0

Irradiance

0

128

255

Pixe

lint

ensi

ty

0.45 0.60 0.75 0.90

Figure 2. Radiometric calibration of an Asus Xtion LivePro camera (blue color channel). Left: camera responsefunction, recovered up to an unknown scale factor. Right:vignetting response as a map of pixel attenuation factors.

On its own, this conversion does not increase the dy-namic range; the radiances of saturated pixels remainunknown. However, it brings the values into a linearspace at absolute scale. Thus, this conversion allowsto combine this image with others taken at differentexposure times and having different effective range.

The basic procedure of HDR capture involves tak-ing a set of LDR images Ii at n different exposuretimes. These images are converted into radiance im-ages Li and are combined using:

L (u) =

∑ni=1w (Ii (u))Li (u)∑n

i=1w (Ii (u)), (3)

where w is a confidence weight that depends on thepixel measurement. In the early work of Debevec andMalic [5] a hat function was used. Later, Kirk andAndersen [10] characterized noise properties of sev-eral other weighting schemes. They concluded thatthe variance-based weighting gives best lower boundon signal-to-noise ratio. Hasinoff et al. [7] investi-gated the problem of selecting exposure times andgains for noise-optimal HDR capture.

Conventional HDR methods assume that eachpixel u represents the same scene point x across dif-ferent images. Effectively, this means that both thecamera and the scene should be static. In more re-cent works, attempts are made to relax this require-ment and allow capturing without tripod [22], or totolerate moving objects in the scene [16].

2.3. HDR mapping

The ideas from the HDR imaging area wereapplied in the context of 3D reconstruction withRGB-D cameras. Zhang et al. [21] presented an of-fline method to obtain globally optimal HDR tex-tures for a reconstructed 3D model. They formulateda nonlinear optimization problem, where per-imageexposures and point radiances are the unknowns.

Motivated by augmented reality applications, par-ticularly insertion of reflective objects with shadowsinto a video stream, Meilland et al. [12] proposed adense SLAM system that recovers HDR colors. Theyrepresent the 3D scene model as a graph of super-resolved HDR keyframes, each of which is a resultof fusion of a set of LDR images. Camera trackingis performed through direct alignment with geomet-ric and photometric error terms. The latter includesrelative exposure time, estimated jointly with cameratransform. They model the camera response with thegamma function and ignore the vignetting effects.

Recently, Li et al. [11] described a different ap-proach to mapping with HDR colors, where they ex-tend a volumetric SLAM framework. In their for-mulation exposure compensation is decoupled fromtracking. The alignment problem is cast in the nor-malized radiance space that is independent of ex-posure time. Once the new camera pose is esti-mated, the exposure time change is determined as aweighted average of radiance ratios between corre-sponding pixels. Finally, the radiance map is scaledusing the estimated exposure time and is fused intothe global volumetric representation.

Our system is similar to the latter two in that itperforms online 3D reconstruction with HDR colors.The difference lies in that we use a surfel-based rep-resentation and control the exposure time of the cam-era based on the current state of the map.

3. System overview

We employ an architecture typical for real-timedense SLAM systems, where camera tracking is al-ternated with mapping. The flow diagram is pre-sented in Figure 3. Input color images I from anRGB-D camera are radiometrically rectified to obtainradiance maps L. Together with the depth maps theyare used to estimate the current camera pose withinthe map. This is done through direct alignment withthe virtual radiance and depth maps predicted at theprevious pose of the camera. The rectified input datais then fused into the existing model M using the es-timated camera pose. Next we perform view predic-tion of the updated model from the estimated camerapose. Finally, the predicted saturation map S is usedby the exposure time controller to select the shuttertime for the next frame.

This pipeline is based on the work of Keller etal. [8]. The main difference and contribution of thispaper consists in (a) extension of various pipeline

Page 4: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

Camera I, D Preprocessing

L, C, D

Camera poseestimation

R, t

Data fusionM

View prediction

Exposure timecontroller

S, L, D

Figure 3. Flow diagram of the proposed SLAM system.Filled boxes represent pipeline processing stages, emptyboxes represent data structures, and arrows represent dataand control flow.

stages to deal with HDR colors and (b) introductionof an exposure time controller in the loop. The nexttwo sections cover these topics.

4. Mapping with HDR colors

4.1. Preprocessing

The camera delivers noisy LDR images of thescene. Before fusing them into the map, we performradiometric rectification using (2) to bring them intoa linear color space at the same scale as the map. Thecamera response function and vignetting effects areprecalibrated [1], and the exposure time t is knownfor every frame.

4.2. Map representation

The map is represented by a set of surfels M.Each surfel has the following fields: position p ∈R3, normal n ∈ R3, weight w ∈ R, radius r ∈ R,timestamp t, and HDR color c ∈ N4.

The first element c1 of an HDR color stores a con-fidence value. Positive confidence indicates that thecolor is valid, i.e. the surfel was observed througha non-saturated pixel at least once. In this case theremaining three elements contain per-channel radi-ances. Zero confidence indicates that the color is in-valid, i.e. up to now the surfel was always observedthrough under- or overexposed pixels. In this casec2 and c3 store the longest and the shortest exposuretimes at which the surfel was observed so far, and c4is a binary flag indicating whether the observed pix-els where under- or overexposed.

Each of the fields of an HDR color is stored in a16-bit integer, requiring 64 bits in total. Note that e.g.ElasticFusion implementation1 allocates 64 bits forsurfel color (although only 24 bits are used). There-fore, the conversion to HDR can be achieved withoutchanging of the memory footprint.

4.3. Camera pose estimation

Camera pose update is estimated through directregistration of the current input frame with renderedview of the model as seen from the previous camerapose. We minimize a joint cost function consisting ofgeometric and photometric residuals, exactly as de-scribed by Whelan et al. [20]. Note that in our casethe input images are radiometrically rectified and areat scale with the map. Therefore, even though theformulation is the same, our registration minimizeserror in the HDR space.

4.4. Data fusion

We perform projective association of the inputdata with the existing surfels. The computed cor-respondences are used to update positions, normals,weights, radii, and timestamps of the surfels follow-ing the rules described by Keller et al. [8]. We intro-duce an additional rule to handle the fusion of colors.

Consider a pixel u which is associated with a sur-felMs having color c. We distinguish between fourcases according to the validity of the colors beingfused. When both surfel color and input color areinvalid, we update the minimum and maximum ex-posure times:

c2 ← min (c2, t) , c3 ← max (c3, t) . (4)

When the surfel color is invalid, and input color isvalid, the former is overwritten. When the surfelcolor is valid, and the input color is invalid, noth-ing is done. Finally, if both colors are valid, then aper-channel weighted average is computed.

4.5. View prediction

After every map update the view of the modelM,as seen from the current camera pose, is predicted.We implemented an OpenGL pipeline with a sim-ple surface splatting technique, where each surfel isrendered as an opaque hexagon [17]. The fragmentshader is configured to output to multiple textures:depth map D, radiance map L, and saturation map

1https://github.com/mp3guy/ElasticFusion

Page 5: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

Figure 4. Left: predicted saturation map. Red pixelscorrespond to the surfels with invalid overexposed colors,blue pixels correspond to the surfels with invalid underex-posed colors. Right: color image of the scene.

S. The latter two are induced by the surfel colorsaccording to the following rules.

For a valid color c the values stored in c1, c2, andc3 are copied into the radiance map, and the satura-tion is zero. Contrary, for an invalid color the radi-ance is set to zero, whereas the saturation is givenby:

s(c) =

c2, if c4 = 0

−c3, otherwise.(5)

For the surfels that have been always observedthrough overexposed pixels, the saturation value willbe positive and equal to the shortest exposure time.For the surfels that were always underexposed, thevalue will be equal to the longest exposure time, witha negative sign. An example of a predicted saturationmap is shown in Figure 4a. The saturation map isused to control camera exposure time as detailed inSection 5.

5. Exposure time control

Consumer RGB-D cameras often have built-inAE function. The particular algorithm is vendor-dependent, but generally it tries to adjust exposuretime to strike a balance between under- and overex-posing the scene. Such control is adequate to achieveas-good-as-possible exposure in a single image, how-ever for the purposes of spatially extended 3D recon-struction it is suboptimal.

Our system analyses the current state of the recon-structed model and decides which exposure time willyield most information gain in the next frame. Forthis purpose we utilize the saturation map renderedin the view prediction step.

The non-zero pixels of the saturation map corre-spond to the model surfels that are visible from thecurrent camera pose, but do not have a valid color.By properly adjusting exposure time we may obtain

valid color information for these surfels in subse-quent frames. We use the following intuition to for-mulate the control rules:

1. Exposure time should be varied smoothly toavoid sudden massive changes in the image ap-pearance;

2. Saturated pixels close to the center of the im-age should have more influence because theyare more likely to stay in the field of view inthe subsequent frames;

3. Exposure controller should have hard limits onthe allowed exposure times, because too largeexposure times lead to high degree of motionblur, which significantly deteriorates the qualityof the color models.

The consequence of the first rule is that the controldecision boils down to choosing whether to increase,decrease, or keep the same exposure time. We applya function ω(·) to each pixel in the saturation mapand sum up the results. The sign of the sum indicateswhether to increase or decrease the exposure time.

The function ω(·) is defined as follows:

ω(s) = exp

(1

d (s, t)

)exp

(−γ2

2σ2

), (6)

where d (·, ·) is the difference between saturationvalue and current exposure time, γ is the normalizedradial distance of the pixel, and σ is a weighting fac-tor. The first part gives more influence to pixels thatrequire less change in exposure time. The secondpart gives more influence to the pixels close to thecenter, as stipulated by the second rule.

6. Experimental evaluation

We performed a number of small-scale reconstruc-tions with our system and the state-of-the-art LDRsystem of Whelan et al. [20] to demonstrate the ben-efits of using high dynamic range colors and the cus-tom exposure controller.

In the first experiment we recorded an RGB-D se-quence with fixed exposure time. Figure 5 demon-strates a birds-eye view of LDR and HDR reconstruc-tions. Our texture is more smooth and consistent, es-pecially in bright (white paper) and dark (dark partsof the table) texture regions.

In the second experiment we performed two scansof an office table, one with fixed exposure time, and

Page 6: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

Figure 5. Top: LDR reconstruction obtained using ElasticFusion [20] with disabled camera AE and fixed exposure time.Bottom: HDR reconstruction obtained using our system from the same data sequence.

one with our exposure controller enabled. Figure 6presents LDR reconstruction using the first sequenceand HDR reconstruction using the second sequence.As before, HDR reconstruction has more smooth andconsistent textures. Furthermore, the dark objects inthe scene (keyboard and phone) have been properlyexposed and more details are preserved.

In the third experiment we performed two scansof a washing machine, one with fixed exposure time,and one with our exposure controller enabled. Bothsequences were used to produce HDR reconstruc-tions with our system. Figure 4 presents the vi-sual appearance of the reconstructions and the confi-dence values associated with the surfels. Clearly, the

Page 7: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

Figure 6. Top: LDR reconstruction obtained using ElasticFusion [20] with disabled camera AE and fixed exposure time.Bottom: HDR reconstruction obtained using our system with the custom exposure controller enabled.

Page 8: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

Figure 7. HDR reconstructions obtained using our sys-tem, without (top row) and with (bottom row) the customexposure controller. Left column shows the visual appear-ance of the reconstruction, right column shows the colorconfidence of the surfels (color-coded, yellow means con-fident, dark violet means invalid color).

darker areas in the bottom and the shelves were notproperly exposed in the sequence with fixed shuttertime. Conversely, in the second sequence the expo-sure controller made sure that all parts of the scenewere properly exposed.

7. Conclusions and future work

In this contribution we presented a study of HDRmapping with consumer RGB-D cameras. Exten-sions to the standard surfel-based SLAM systemwere described that lead to improved texture quality.

In our current implementation a number of designdecisions (such as exposure control rules) were madewithout proper empirical evaluation. The challenge,however, is to define suitable evaluation metrics.

It is interesting to evaluate the influence of im-proved color model and error minimization in HDRcolor space on the tracking performance.

Since our method is active, it is challenging to per-form a fair comparison with ”passive” HDR systemsthat use cameras’ built-in AE function. The sameRGB-D sequence can not be used, and it is almostimpossible to reproduce the same trajectory withoutinvolved robotic setup. One solution would be use asynthetic dataset, where multiple frames with differ-ent exposure times may be rendered for each camerapose in the trajectory.

Reconstruction with HDR colors is a relativelynew area, and there are no benchmark datasets. Engelet al. [6] recently published a dataset for monocularodometry with radiometrical calibration of the cam-era. In a similar spirit, it may be useful to collect andpublish a RGB-D dataset with such calibration.

Acknowledgments

The work presented in this paper has been fundedby the European Union Seventh Framework Pro-gramme (FP7/2007-2013) under grant agreementNo. 600623 (”STRANDS”) and No. 610532(”SQUIRREL”).

References

[1] S. V. Alexandrov, J. Prankl, M. Zillich, andM. Vincze. Calibration and Correction of VignettingEffects with an Application to 3D Mapping. In Proc.of IROS, 2016. 2, 4

[2] C. Cadena, L. Carlone, H. Carrillo, Y. Latif,D. Scaramuzza, J. Neira, I. D. Reid, and J. J.Leonard. Past, Present, and Future of SimultaneousLocalization And Mapping: Towards the Robust-Perception Age. arXiv preprint, jun 2016. 1

[3] J. Chen, D. Bautembach, and S. Izadi. ScalableReal-time Volumetric Surface Reconstruction. ACMTransactions on Graphics, 32(4):1, 2013. 1

[4] A. Dai, M. Nießner, M. Zollhofer, S. Izadi, andC. Theobalt. BundleFusion: Real-time Globally

Page 9: Towards Dense SLAM with High Dynamic Range Colorscvww2017.prip.tuwien.ac.at/papers/CVWW2017_paper_13.pdf · with such situations the high dynamic range (HDR) imaging method was developed

Consistent 3D Reconstruction using On-the-fly Sur-face Re-integration. arXiv preprint, 2016. 1

[5] P. Debevec and J. Malik. Recovering High DynamicRange Radiance Maps from Photographs. In Proc.of SIGGRAPH, pages 1–10, 1997. 3

[6] J. Engel, V. Usenko, and D. Cremers. A Photomet-rically Calibrated Benchmark For Monocular VisualOdometry. arXiv preprint, jul 2016. 2, 8

[7] S. W. Hasinoff, F. Durand, and W. T. Freeman.Noise-optimal Capture for High Dynamic RangePhotography. In Proc. of CVPR, 2010. 3

[8] M. Keller, D. Lefloch, M. Lambers, S. Izadi,T. Weyrich, and A. Kolb. Real-time 3D Reconstruc-tion in Dynamic Scenes using Point-based Fusion.In Proc. of 3DV, pages 1–8, 2013. 1, 3, 4

[9] S. J. Kim and M. Pollefeys. Robust Radiometric Cal-ibration and Vignetting Correction. IEEE Transac-tions on Pattern Analysis and Machine Intelligence,30(4):562–576, 2008. 2

[10] K. Kirk and H. J. Andersen. Noise Characterizationof Weighting Schemes for Combination of MultipleExposures. In Proc. of BMVC, 2006. 3

[11] S. Li, A. Handa, Y. Zhang, and A. Calway. HDRFu-sion: HDR SLAM using a low-cost auto-exposureRGB-D sensor. arXiv preprint, apr 2016. 2, 3

[12] M. Meilland, C. Barat, and A. Comport. 3D HighDynamic Range Dense Visual SLAM and its Appli-cation to Real-time Object Re-lighting. In Proc. ofISMAR, pages 143–152, 2013. 2, 3

[13] M. Meilland and A. I. Comport. On UnifyingKey-frame and Voxel-based Dense Visual SLAM atLarge Scales. In 2013 IEEE/RSJ International Con-ference on Intelligent Robots and Systems, pages3677–3683. IEEE, nov 2013. 1

[14] R. A. Newcombe, S. Izadi, O. Hilliges,D. Molyneaux, D. Kim, A. Davison, P. Kohli,J. Shotton, S. Hodges, and A. Fitzgibbon. Kinect-Fusion: Real-time Dense Surface Mapping andTracking. In Proc. of ISMAR, 2011. 1

[15] M. Niessner, M. Zollhofer, S. Izadi, and M. Stam-minger. Real-time 3D Reconstruction at Scale Us-ing Voxel Hashing. ACM Transactions on Graphics,32(6):169:1—-169:11, 2013. 1

[16] T. H. Oh, J. Y. Lee, Y. W. Tai, and I. S. Kweon. Ro-bust High Dynamic Range Imaging by Rank Min-imization. IEEE Transactions on Pattern Analy-sis and Machine Intelligence, 37(6):1219–1232, jun2015. 3

[17] R. F. Salas-Moreno. Dense Semantic SLAM. PhDthesis, 2014. 4

[18] P. Sen and C. Aguerrebere. Practical High DynamicRange Imaging of Everyday Scenes: Photographingthe World as We See It with Our Own Eyes. IEEESignal Processing Magazine, sep 2016. 2

[19] T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J.Leonard, and J. McDonald. Real-time Large-scaleDense RGB-D SLAM with Volumetric Fusion. TheInternational Journal of Robotics Research, 34(4-5):598–626, 2014. 1

[20] T. Whelan, S. Leutenegger, R. Salas-Moreno,B. Glocker, and A. Davison. ElasticFusion: DenseSLAM Without A Pose Graph. In Proc. of RSS,2015. 1, 4, 5, 6, 7

[21] E. Zhang, M. F. Cohen, and B. Curless. Emptying,Refurnishing, and Relighting Indoor Spaces. Proc.of SIGGRAPH Asia, 35(6), 2016. 3

[22] H. Zimmer, A. Bruhn, and J. Weickert. FreehandHDR Imaging of Moving Scenes with Simultane-ous Resolution Enhancement. In Computer Graph-ics Forum, volume 30, pages 405–414, 2011. 3