6
Real-time Graph-Based 3D Reconstruction of Sparse Feature Environments for Mobile Robot Applications Daniel Gaspers 1 , Christoph Knorr 1 , Tobias Nickchen 1 , Daniel Nickchen 2 , arbel Mertsching 1 , Mahmoud A. Mohamed 1 Abstract— In the last years, the demand for mobile robots in rescue and surveillance has increased considerably as these systems allow to reduce human harm and save the workforce. A major problem still is the mapping of unknown environments for localization and navigation purposes. Common 2D mapping is not sufficient for 3D environments with multiple levels and 3D structures. Unfortunately, recent approaches for 3D reconstruction suffer from high hardware requirements to meet real-time constraints and the accumulation of errors in the reconstruction result over time. Moreover, the loop closing problem could not be solved satisfactorily yet. In this paper, we propose a new approach for real-time 3D reconstruction that meets the hardware requirements of mobile robots and is capable of detecting and closing loops to reduce errors. We therefore combine and modify three state-of-the-art approaches into a 3D reconstruction system that is also working in sparse feature environments. Each of the subsystems runs in a parallel thread accelerated by the GPU and can easily be replaced by another algorithm if necessary. I. INTRODUCTION In recent years mobile robots are increasingly used for surveillance, inspection and rescue tasks. For autonomous as well as for teleoperated usage an artificial representation based on sensor data is needed for localization and naviga- tion. The most common solution is a 2D map usually based on distances to walls or obstacles. This approach fails, if the explored environment is not flat but covers multiple depth levels. Nowadays innovative depth cameras like the ASUS Xtion [1] are used to produce RGBD data (RGB image with detailed depth image) of the environment. This data can be used to three-dimensionally reconstruct the environment and provide 3D information usable for localization and mapping as well as for further tasks like terrain analysis or object detection. The primary task of 3D reconstruction based on RGBD data is to register subsequent depth images into a global point-cloud representing the environment. Existing ap- proaches either use depth registration algorithms like the Iterative Closest Point (ICP) [2] or feature tracking in RGB data [3]. Some algorithms run on the CPU and others are accelerated by the GPU and real-time capable. However, most approaches only work in environments with a high density of features. The most popular 3D reconstruction algorithm is the GPU-based KinectFusion [4] that allows 1 D. Gaspers, C. Knorr, T. Nickchen, B. Mertsching and M. A. Mohamed are with GET Lab, Faculty of Electrical Engineering, Informatics and Math- ematics, Paderborn University, 33098 Paderborn, Germany. 2 D. Nickchen is with the Fraunhofer Institute for Mechatronic Systems Design IEM, Paderborn. Corresponding author: M. A. Mohamed ([email protected]). for real-time reconstruction and provides valuable results in small environments. Other approaches based on the idea of KinectFusion also work in larger environments but possess the problem of accumulated errors over time and distance. Thus the map does not match when returning to an already visited position after a loop in the environment. We propose a new real-time 3D reconstruction system that also works in larger environments and with sparse features. The modular approach can be used to extend any 3D reconstruction algorithm by a loop detection and optimization and is suited for mobile robots due to the relatively low hardware require- ments compared to the other state-of-the-art. The remainder of this paper is structured as follows. The current state in research is addressed in section II. Afterward, the new 3D reconstruction system and its implementation are presented in section III. Conducted experiments are described in section IV, and the results of the registration and optimization with the developed system follow in section V. The paper ends with the conclusion in section VI. II. RELATED WORK As each depth image is related to one camera position, 3D reconstruction is directly linked to camera tracking. On the one hand, the camera trajectory between two consecutive depth images can be computed, whereby the positions of the camera at the moments the depth images are taken can be derived. One approach is the ICP algorithm that registers two overlapping point clouds or depth images by minimizing the sum of squared distances between corresponding points. On the other hand, camera positions can be determined using feature tracking either in the RGB or the depth image and afterward the depth images can be merged accordingly (see, e.g., [3]). Many algorithms directly create fixed reconstructions of the whole environment, which cannot be optimized. This leads to a crucial problem: After adding a new depth image to the point-cloud, errors can not be corrected anymore. Due to the accumulation of small errors between successive depth images over time, large gaps can occur in the reconstruction when the camera returns to an already visited location. Therefore, some approaches make use of a graph to represent the environment [5] [6] instead of fixed 3D representations as one point-cloud. Each node of such a graph is related to a single depth image or at least a local reconstruction of the environment. The edges are representing the transformations and their uncertainties between these nodes. The local error

Real-time Graph-Based 3D Reconstruction of Sparse Feature ...static.tongtianta.site/paper_pdf/81b3986a-6b08-11e9-9ae3-00163e08bb86.pdfwhich is also suited for mobile devices. Therefore,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Real-time Graph-Based 3D Reconstruction of Sparse Feature ...static.tongtianta.site/paper_pdf/81b3986a-6b08-11e9-9ae3-00163e08bb86.pdfwhich is also suited for mobile devices. Therefore,

Real-time Graph-Based 3D Reconstruction of Sparse FeatureEnvironments for Mobile Robot Applications

Daniel Gaspers1, Christoph Knorr1, Tobias Nickchen1, Daniel Nickchen2,Barbel Mertsching1, Mahmoud A. Mohamed1

Abstract— In the last years, the demand for mobile robotsin rescue and surveillance has increased considerably as thesesystems allow to reduce human harm and save the workforce.A major problem still is the mapping of unknown environmentsfor localization and navigation purposes. Common 2D mappingis not sufficient for 3D environments with multiple levelsand 3D structures. Unfortunately, recent approaches for 3Dreconstruction suffer from high hardware requirements to meetreal-time constraints and the accumulation of errors in thereconstruction result over time. Moreover, the loop closingproblem could not be solved satisfactorily yet. In this paper,we propose a new approach for real-time 3D reconstructionthat meets the hardware requirements of mobile robots andis capable of detecting and closing loops to reduce errors. Wetherefore combine and modify three state-of-the-art approachesinto a 3D reconstruction system that is also working in sparsefeature environments. Each of the subsystems runs in a parallelthread accelerated by the GPU and can easily be replaced byanother algorithm if necessary.

I. INTRODUCTION

In recent years mobile robots are increasingly used forsurveillance, inspection and rescue tasks. For autonomousas well as for teleoperated usage an artificial representationbased on sensor data is needed for localization and naviga-tion. The most common solution is a 2D map usually basedon distances to walls or obstacles. This approach fails, if theexplored environment is not flat but covers multiple depthlevels. Nowadays innovative depth cameras like the ASUSXtion [1] are used to produce RGBD data (RGB image withdetailed depth image) of the environment. This data can beused to three-dimensionally reconstruct the environment andprovide 3D information usable for localization and mappingas well as for further tasks like terrain analysis or objectdetection.

The primary task of 3D reconstruction based on RGBDdata is to register subsequent depth images into a globalpoint-cloud representing the environment. Existing ap-proaches either use depth registration algorithms like theIterative Closest Point (ICP) [2] or feature tracking in RGBdata [3]. Some algorithms run on the CPU and others areaccelerated by the GPU and real-time capable. However,most approaches only work in environments with a highdensity of features. The most popular 3D reconstructionalgorithm is the GPU-based KinectFusion [4] that allows

1D. Gaspers, C. Knorr, T. Nickchen, B. Mertsching and M. A. Mohamedare with GET Lab, Faculty of Electrical Engineering, Informatics and Math-ematics, Paderborn University, 33098 Paderborn, Germany. 2D. Nickchenis with the Fraunhofer Institute for Mechatronic Systems Design IEM,Paderborn. Corresponding author: M. A. Mohamed ([email protected]).

for real-time reconstruction and provides valuable results insmall environments. Other approaches based on the idea ofKinectFusion also work in larger environments but possessthe problem of accumulated errors over time and distance.Thus the map does not match when returning to an alreadyvisited position after a loop in the environment. We proposea new real-time 3D reconstruction system that also works inlarger environments and with sparse features. The modularapproach can be used to extend any 3D reconstructionalgorithm by a loop detection and optimization and is suitedfor mobile robots due to the relatively low hardware require-ments compared to the other state-of-the-art.

The remainder of this paper is structured as follows. Thecurrent state in research is addressed in section II. Afterward,the new 3D reconstruction system and its implementationare presented in section III. Conducted experiments aredescribed in section IV, and the results of the registration andoptimization with the developed system follow in section V.The paper ends with the conclusion in section VI.

II. RELATED WORK

As each depth image is related to one camera position,3D reconstruction is directly linked to camera tracking. Onthe one hand, the camera trajectory between two consecutivedepth images can be computed, whereby the positions of thecamera at the moments the depth images are taken can bederived. One approach is the ICP algorithm that registers twooverlapping point clouds or depth images by minimizing thesum of squared distances between corresponding points. Onthe other hand, camera positions can be determined usingfeature tracking either in the RGB or the depth image andafterward the depth images can be merged accordingly (see,e.g., [3]).

Many algorithms directly create fixed reconstructions ofthe whole environment, which cannot be optimized. Thisleads to a crucial problem: After adding a new depth imageto the point-cloud, errors can not be corrected anymore. Dueto the accumulation of small errors between successive depthimages over time, large gaps can occur in the reconstructionwhen the camera returns to an already visited location.Therefore, some approaches make use of a graph to representthe environment [5] [6] instead of fixed 3D representationsas one point-cloud. Each node of such a graph is related toa single depth image or at least a local reconstruction of theenvironment. The edges are representing the transformationsand their uncertainties between these nodes. The local error

Page 2: Real-time Graph-Based 3D Reconstruction of Sparse Feature ...static.tongtianta.site/paper_pdf/81b3986a-6b08-11e9-9ae3-00163e08bb86.pdfwhich is also suited for mobile devices. Therefore,

TABLE IA SELECTION OF 3D RECONSTRUCTION ALGORITHMS.

CPU GPU

no loop-closing NICP KinectFusionInfiniTAM

loop-closing RGBD-SLAM ElasticFusion

along an edge can be corrected by recalculating the trans-formations based on new observations to decrease the globalerror of the reconstruction. Graph-based algorithms mainlyconsist of two parts, front-end, and back-end. In the front-end, the graph is generated by adding local reconstructionsas nodes and the corresponding edges related to a givenorigin. The back-end detects loops in the traveled trajectoryand optimizes the graph regarding the minimum error in thetransformations.

RGBD-Simultaneous Localization and Mapping (SLAM)[7] is an example of a graph-based approach using RGBand depth data. Due to its implementation on a CPU, theprocessing speed is deficient (about 3 Hz at a depth imagesize of VGA), and the reconstruction is not satisfying. Theseresults could be confirmed in our tests. A selection offurther promising approaches is arranged in Table I basedon the used processing unit and the ability of loop-closingand optimization. Further details about these algorithms areexplained in the following sections.

A. NICP

The Normal Iterative Closest Point (NICP) algorithm [8] isan extension of the basic ICP and registers two overlappingdepth images. Beyond the distance between correspondingpoints, the normal and the curvature of a local surface areused. The processing speed for a resolution of QQVGA isreported to be 30 Hz [8]. We performed our tests with animage size of QVGA and reached 3 Hz (test setup for thisand the following algorithms see section IV) which is still tooslow for real-time processing, although the registration resultis satisfying. An advantage of NICP is that it, moreover,provides the corresponding covariance matrix describing theuncertainty of the solution.

B. KinectFusion

KinectFusion [4] is a GPU based camera tracking and3D reconstruction system. In the first step, the depth mapconversion, the values of the raw depth image are convertedfrom millimeters to meters. The second step is the cameratracking which determines the camera pose difference be-tween two subsequent images based on an ICP algorithm.Thus the corresponding camera transformation concerningthe starting position is obtained. During the volumetricintegration step, a single volumetric representation based onan octree is computed from the view of the known camerapose by calculating a running average of the informationof all images. The volumetric representation is a TruncatedSigned Distance Function (TSDF) which can be implementedefficiently and fast on a GPU. The TSDF stores the distanceto the nearest surface for each voxel. Values larger than a

specified threshold are truncated to save memory. Therefore,the surface of the scene is represented by the zero crossingsof the TSDF. The developers of KinectFusion report aprocessing time of 40 Hz up to 100 Hz depending on theoctree resolution [9]. We could reach a processing speedof 15 Hz for a VGA image size. KinectFusion providesgood results in environments with many features but hasproblems in areas with only large flat segments due tothe use of the ICP algorithm. The original KinectFusionis suited only for the reconstruction of small areas due tothe limited available GPU memory. Due to its popularity,KinectFusion has been reimplemented as open source [9]and many researchers have adopted the idea of the volumetricrepresentation with a TSDF [5] [10]. Different extensions ofKinectFusion overcome the limitation to small areas, e.g., byadding swapping between CPU and GPU memory [5].

C. InfiniTAM

The InfiniTAM [10] is a GPU based approach also us-ing a TSDF known from KinectFusion for the volumetricrepresentation of the environment. The authors focused onthe development of a fast and efficient GPU implementationwhich is also suited for mobile devices. Therefore, existingvolumetric integration methods are extended to improve theperformance. The 3D reconstruction of the environment isstored as a TSDF in volumetric pixels (voxels), that areaccessed via a hash-table which makes the access efficientand fast. The processing of InfiniTAM consists of five steps:

1) Tracking for localization of the camera2) Allocation for allocating new voxel blocks3) Integration for updating the 3D world model4) Swapping for transferring data from or to the GPU5) Raycasting for rendering the world model from the

current poseThe results of the InfiniTAM algorithm are satisfying com-pared to other algorithms. Images are processed with up to523 Hz for an image size of VGA [10], and an accuratemap of the environment is provided. In our experiments, wemeasured processing time of 200 Hz on a less powerful GPUthan the InfiniTAM authors used. The high quality of thereconstruction is reached in nearly all environments. Only ifthey consist of long flat segments with sparse features likelong corridors the method fails. In these cases, the ICP algo-rithm in the tracking step is not able to project new imagescorrectly to the current reconstruction. Another drawback isthat the algorithm does not provide loop-closing which leadsto an accumulation of the error in the reconstruction overtime.

D. ElasticFusion

ElasticFusion [11], also based on KinectFusion, utilizesthe GPU and closes loops to optimize the global recon-struction. The results are therefore highly related to theenvironment of the camera. In office-like environments withmany features, the results are superior to other approaches.However, in sparse feature environments, local loops areoften detected erroneously, and the results worsen. The

Page 3: Real-time Graph-Based 3D Reconstruction of Sparse Feature ...static.tongtianta.site/paper_pdf/81b3986a-6b08-11e9-9ae3-00163e08bb86.pdfwhich is also suited for mobile devices. Therefore,

algorithm processes images with 22 Hz, which could beconfirmed by our tests, for VGA resolution.

III. CONCEPT & IMPLEMENTATION

The evaluation of the existing 3D reconstruction algo-rithms has shown that CPU-based algorithms do not runin real-time. Furthermore, loop detection and optimizationare essential factors for the reconstruction accuracy in largeenvironments. ElasticFusion is the only system with inte-grated loop-closing for which the source code is provided.In our experiments, ElasticFusion needed many features todetect loops correctly. In sparse feature environments, thelocal loop-closing produces errors and leads to unusableresults. The InfiniTAM algorithm, however, works with ahigh frame rate and produces appropriate results, especiallyregarding local reconstructed areas. The only drawback isthe accumulated error.

This has lead to the decision to develop a graph-basedframework to combine the advantages of several existingalgorithms in one system which works in as many environ-ments as possible. The idea is to create a pose-graph contain-ing local reconstructed point-clouds of the environment eachrelated to a graph node connected with edges providing thetransformation information. We will now introduce the termsub-cloud for these local reconstructions. Loop detectionallows the optimization of the reconstruction to reduce theoverall errors. A similar approach is proposed in [6]. Localmaps of the environment created by an extended version ofthe KinectFusion algorithm are stored as a TSDF in a graph.If a loop is detected the graph is optimized. However, theloop detection is not available at the moment; it can only bedone manually.

The presented approach utilizes the high processing rateand the reconstruction quality of the InfiniTAM algorithmwhich is substantial to allow fast camera movements. Anadaption makes it possible to reconstruct local parts ofthe environment instead of creating only one global recon-struction. To extend this algorithm by a loop-closing andgraph optimization feature, the g2o framework [12] has beenused. g2o is a C++ open source framework for optimizinggraph-based nonlinear error functions. g2o provides datatypes to create a pose-graph and allows its optimization.There are types for different 2D and 3D problems likeSLAM or Bundle Adjustment (BA). The g2o types are easilyextendable and can be used for a wide range of problems. Anew node type was implemented to store local sub-clouds.To estimate the poses between the sub-clouds, the NICP hasbeen selected, which provides not only a transformation butalso a covariance matrix that represents the uncertainty of thetransformation which is later used for the optimization. Asthe NICP needs depth images, the first and last depth imageof the sub-clouds are stored in the related pose-graph node.The system presented in [6] stores all the images becauseit optimizes the local maps besides global loop-closing. Weassume the reconstruction quality of the InfiniTAM and thusof the sub-clouds to be highly accurate and precise.

Fig. 1. Pose-graph creation concept.

We, therefore, do not optimize sub-clouds and avoidstoring all depth images. In comparison to most of the relatedapproaches which need a high-end desktop GPU, our systemcan be applied to a robot equipped with a notebook anda mobile Nvidia GPU. The workflow of the proposed 3Dreconstruction system consists of four steps whose first threesteps take place in the front-end, which is responsible forthe registration of images and creation of the pose-graph.The last step in the back-end detects loops, extends andoptimizes the pose-graph. In the following, we will explainthese steps. 1) Create local point-clouds: The InfiniTAM isused to create local reconstructions in a periodical manner,depending on the traveled distance and the difference in angleregarding the last local reconstruction. For the creation ofeach sub-cloud, the current pose in the InfiniTAM algorithmis reset so that each sub-cloud starts in the origin of thelocal coordinate frame. Each of these sub-clouds belongs toa specific pose in the global coordinate system representedby one node in the pose-graph. The sub-cloud Ci consistsof Mi images registered by the InfiniTAM. Mi is not fixedand differs between each sub-cloud. These sub-clouds andtheir transformation Ti1 from depth image Ii0 to depthimage IiMi−1 are fixed and cannot be optimized afterward.The covariance

∑i1 of transformation Ti1 is set to zero

because we estimate the transformation has no error. 2)Compute the transformation between subsequent sub-clouds:To compute the transformation of subsequent sub-clouds,the i-th sub-clouds last depth image and the (i+1)-th sub-clouds first depth image is fed into the NICP. The result isan estimated transformation Ti2 between the sub-clouds anda covariance matrix

∑i2, which describes the uncertainty. 3)

Create the pose-graph: For each new sub-cloud provided bythe InfiniTAM, a new node is added to the g2o framework’spose-graph. The transformation Ti, i + 1 from this nodeto the previous one is calculated by multiplying Ti2 andTi1 from the first two steps and added as an edge to thepose-graph. The transformation’s uncertainty is given by∑

i, i + 1 =∑

i2 because∑

i1 is assumed to be zero.The sub-cloud Ci is attached to its corresponding node Ni.The results of the first three steps are visualized in Fig. 1.

4) Enhance the accuracy of the global point-cloud by loopdetection and optimization: Every time a node is added to thepose-graph the back-end tries to close a loop to all existing

Page 4: Real-time Graph-Based 3D Reconstruction of Sparse Feature ...static.tongtianta.site/paper_pdf/81b3986a-6b08-11e9-9ae3-00163e08bb86.pdfwhich is also suited for mobile devices. Therefore,

nodes, except a defined number of most recently added nodesto avoid too many local loop closures. If the difference inposition, as well as the difference in orientation between twogiven nodes, is under given thresholds (e.g., 0.3 m and 90◦

), the transformation between the two first depth images ofeach cloud is determined using the NICP. If the mean of theresulting covariance matrix is under a given threshold, a loopis considered and an additional edge closing the loop getsadded to the pose-graph. Finally, the pose-graph is optimizedby the g2o framework.

The concept is implemented as a multithreaded C++application and library. Three different threads (handlers),one for each subsystem, are used in the software archi-tecture. Accordingly, an InfiniTAMHandler, a NICPHandlerand a GraphHandler are implemented. The communicationbetween the subsystems is realized via synchronized queues.The InfiniTAMHandler creates sub-clouds from the RGB anddepth data provided by the Asus Xtion and stores these sub-clouds in the queue which can be accessed by the NICPHan-dler. The NICPHandler computes the transformation betweentwo successive sub-clouds and stores the sub-clouds with thecorresponding transformations and covariances in the nextqueue. Finally, the graph handler takes the data from thisqueue and adds it to the pose-graph as nodes and edges. Toenable the usage of the system on a mobile robot a RobotOperating System (ROS) [13] wrapper, which subscribes toRGB and depth images and publishes the reconstruction asa global map, has been implemented.

A. EXPERIMENTS

For the experiments, test datasets have been recorded withthe Asus Xtion at VGA resolution at a constant frame rateof 30 Hz with synchronized RGB and depth images. Thesystem is evaluated on a desktop computer and a notebook.The desktop system is equipped with an Intel Xeon W3520CPU with 4 Cores at 2.66 GHz, 12 GB of RAM and aNvidia Geforce GTX 580 GPU with 3 GB memory whilethe notebook is equipped with an Intel Core i7-4710HQ CPUwith 4 Cores at 2.50 GHz with 16 GB of RAM and a NvidiaGeforce GTX 860M with 2 GB memory. On both devices,the developed system can process the images faster than thecamera delivers them and thus the reconstruction result isprecisely the same because no information is discarded.

Since the focus of this work is on the extension of anexisting 3D reconstruction algorithm by loop detection andoptimization, we present our experiments in two differentenvironments. The first one is an artificial rescue scenario(4.8 m x 3.6 m x 1.2 m) as used in the RoboCup [14] whichcontains only a few features (see Fig. 2). The correspondingrecorded dataset contains 3295 depth images. The seconddataset was recorded in our lab (7.5 m x 4.0 m x 2.5 m)with a higher feature density (see Fig. 3) and consists of 2894depth images. Fig. 4 shows an example of a sub-cloud whichhas been created during the reconstruction of the laboratory.The InfiniTAM algorithm gives good results for small areas,and thus the accumulated error in one sub-cloud is small andcan be neglected. The size of the sub-clouds is a parameter

Fig. 2. Image of the artificial rescue scenario.

Fig. 3. Image of our Lab.

of our developed system. In the example shown in Fig. 4, anew sub-cloud is created when the camera is moved morethan 0.3 m or rotated more than 90◦. These parameters haveyielded the best results for the lab scene. The creation of thereconstructed environment representations are shown in Fig.5 and Fig. 6 is explained in the attached video1. The resultsshow that the loop detection and optimization significantlyimprove the reconstruction quality. Details will be explainedin the following paragraphs.

B. Artificial Rescue Scenario

Fig. 5a shows the unoptimized global map after movingthe robot twice through the rescue arena. The errors in themap due to the error accumulation over time and distanceare visible. For example on the bottom right of Fig. 5athere are two black buckets marked by a green circle. A

1http://getwww.uni-paderborn.de/research/videos/recon

Fig. 4. Sub-cloud created during the reconstruction of the Lab.

Page 5: Real-time Graph-Based 3D Reconstruction of Sparse Feature ...static.tongtianta.site/paper_pdf/81b3986a-6b08-11e9-9ae3-00163e08bb86.pdfwhich is also suited for mobile devices. Therefore,

(a) Map before optimization. (b) Map after optimization.

Fig. 5. Reconstruction in the artificial rescue scenario.

black bucket was reconstructed during the first round inthe arena and a non-existing second one during the secondround. When running through the arena for the second time,the accumulated error has increased resulting in the wrongperception of a second bucket. The same observation can bemade at the bottom right edge of the arena or the white wallat the top of the arena.

During the second round in the arena, loop-edges areautomatically added to the graph. The loop detection inthe implemented system does this by comparison of thecurrent position with the position of other already existingnodes in the graph. In the example above the two sub-cloudswith the buckets are detected as candidates for a loop. Thefirst and last image of the two sub-clouds are registeredwith the NICP algorithm. As the registration was successfuland the covariance was under a certain threshold, an edgewhich closes the loop could be added to the graph. Thetranslation and rotation of the edge are corresponding to thedisplacement of the two buckets. Fig. 5b shows the optimizedglobal map of the rescue scenario yielding only one bucket.The map is optimized once after all loops have been detected.The shape of the reconstructed scenario is similar to the realscenario as shown in Fig. 2.

The tests in the rescue scenario demonstrate that the im-plemented system can achieve good results in environmentswhich provide only sparse features and textures. The graphoptimization in combination with the loop detection signifi-cantly improve the result and provides consistent results aftermoving more than one loop.

C. Laboratory

In the following the results of the experiments in thelaboratory which is characterized by a much higher featuredensity than the scenario before are discussed. For datageneration, the camera was moved clockwise beginning withthe field of view at the bottom wall and desk. Fig. 6a showsthe unoptimized reconstruction result and the optimizedresult in Fig. 6b. In an environment with dense features, theNICP algorithm gives better registration results with a smallcovariance.

In Fig. 6a the walls are not straight and also the angles ofthe edges are not equal to 90◦. This results in a significanterror at the end of the dataset. The last sub-cloud should beat the left bottom, but due to the error, it is positioned in the

(a) Before optimization.

(b) After optimization.

Fig. 6. Reconstruction in the laboratory.

TABLE IITIME MEASUREMENTS

InfiniTAM NICP Searchloop

Optimizegraph

Mapcreation

Rescue ScenarioMean [s] 0.004 0.166 0.115 0.002 1.184Variance [s2] 0.000 0.001 0.031 - -Min [s] 0.002 0.122 0.000 - -Max [s] 0.008 0.226 0.458 - -

LaboratoryMean [s] 0.005 0.184 0.093 0.002 1.249Variance [s2] 0.000 0.001 0.013 - -Min [s] 0.002 0.141 0.000 - -Max [s] 0.008 0.247 0.382 - -

top left corner. Fig. 6b shows the optimized reconstructionresult. The improvement is visible. The last sub-cloud is nowin the bottom left corner and matches to the sub-cloud fromthe first round. The walls at the bottom are linked togetherand not curved anymore. Also, the angles in the corners areequal to 90◦. The textures appear adequate, and it is evenpossible to see the chessboard leaning against the table (seegreen circle in Fig. 6b) at the top. From the comparison of theoptimized map (Fig. 6b) and the unoptimized map (Fig. 6a),it can be seen that the start position of the graph stays fixedwhile the other nodes are moved to minimize the error. Thistest has demonstrated that the implemented system achievesgood results in office environments with dense features.Furthermore, it is visible that the graph optimization hasenhanced the result.

Page 6: Real-time Graph-Based 3D Reconstruction of Sparse Feature ...static.tongtianta.site/paper_pdf/81b3986a-6b08-11e9-9ae3-00163e08bb86.pdfwhich is also suited for mobile devices. Therefore,

IV. DISCUSSION

As explained in section IV the system provides satisfyingreconstruction results in the rescue scenario and the officeenvironment. Thus we could show that the algorithm canhandle environments with varying feature density.

The second important issue besides the quality of theresults is the computation time and accordingly the real-time capability. Table II shows the computation time ofseveral parts of the system including variance, minimumand maximum values. The first column depicts the timewhich InfiniTAM needs for adding one depth image to asub-cloud. The average values equate to 250 Hz and 200Hz which is much faster than the 30 Hz frame rate of theAsus Xtion. Thus the frames can be processed faster thanthe camera provides them. The second column indicates thetime in which the NICP registers two sub-clouds. The meanvalues show that the system is capable of processing morethan five sub-clouds per second. In the tested scenarios thisamount of sub-clouds per second is not reached thus thequeue between InfiniTA-MHandler and NICPHandler nevergrows. Therefore, also this part of the proposed system isreal-time capable.

The average times for searching a loop is 0.115 s (rescuescenario) and 0.093 s (laboratory). In this case, the minimumand maximum values have a more considerable variancebecause the time is proportional to the number of loops foundfor the current node. If no loop is found the computationis neglectable. However, even if many loops are foundfor each new node, the queue between NICPHandler andGraphHandler is not increasing. This indicates that the back-end is not lagging behind the front-end.

The time for the pose-graph optimization is 0.002 secondsand creating the complete point-cloud of the reconstructedenvironment for the two test cases requires nearly onesecond. At the moment both of these computations areonly executed once if the reconstruction is finished. Theoptimization and point-cloud creation can also be done moreoften, but the creation of the map does not require a highframe rate.

V. CONCLUSION

The goal of this work was to develop a system which canreconstruct the various forms of 3D environments of mobilerobots. One main incentive was the fact that many state-of-the-art systems only work in environments with densefeatures or are not capable of real-time applications. Wedemonstrated that our approach combines the advantagesof different approaches. As shown in section V our systemcan reconstruct the environment in real-time and providesaccurate results with low hardware requirements.

It uses the high speed of InfiniTAM combined with theg2o pose-graph creation and optimization potential, whichmakes it possible to reduce accumulated errors by graph op-timization. This differentiates the system from other solutionsespecially in environments with sparse features. Anotherimportant fact is the modularity of our approach giventhrough the division into three subsystems that work in

parallel threads. Each of these three parts can be replaced byanother approach if desired, for instance by another front-endthan InfiniTAM or a different graph-based back-end. Makingthe system easily customizable for different requirements andcapable of extending reconstruction approaches by a loop-closing feature. In the future our system will be extended towork in case of moving objects [15] [16] [17].

REFERENCES

[1] H. Gonzalez-Jorge, B. Riveiro, E. Vazquez-Fernandez, J. Martinez-Sanchez, and P. Arias, ”Metrological evaluation of microsoft kinectand asus xtion sensors”, Measurement, vol. 46, no. 6, pp. 1800-1806,2013.

[2] K.-L. Low, ”Linear least-squares optimization for point-to-plane icpsurface registration”, Chapel Hill, University of North Carolina, vol.4, 2004.

[3] D. Nister, O. Naroditsky, and J. Bergen, ”Visual odometry”, in Com-puter Vision and Pattern Recognition, 2004. CVPR 2004. Proceedingsof the 2004 IEEE Computer Society Conference on, vol. 1. IEEE,2004, pp. 1-652.

[4] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli,J. Shotton, S. Hodges, D. Freeman, A. Davison et al., ”Kinectfusion:Real-time 3d reconstruction and interaction using a moving depthcamera”, in Proceedings of the 24th annual ACM symposium on Userinterface software and technology. ACM, 2011, pp. 559-568.

[5] T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J.McDonald, Kintinuous: ”Spatially extended kinectfusion”, 2012.

[6] R. Wagner, U. Frese, and B. Bauml, ”Graph slam with signed distancefunction maps on a humanoid robot”, in 2014 IEEE/RSJ InternationalConference on Intelligent Robots and Systems. IEEE, 2014, pp. 2691-2698.

[7] F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W.Burgard, ”An evaluation of the rgb-d slam system”, in Robotics andAutomation (ICRA), 2012 IEEE International Conference on. IEEE,2012, pp. 1691-1696.

[8] J. Serafin and G. Grisetti, Nicp: ”Dense normal based point cloud reg-istration”, in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJInternational Conference on, 2015, pp. 742-749.

[9] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim,A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon,”Kinectfusion: Real-time dense surface mapping and tracking”, IS-MAR, 2011.

[10] O. Kahler, V. A. Prisacariu, C. Y. Ren, X. Sun, P. H. S. Torr, and D.W. Murray, ”Very High Frame Rate Volumetric Integration of DepthImages on Mobile Device”, IEEE Transactions on Visualization andComputer Graphics (Proceedings International Symposium on Mixedand Augmented Reality 2015, vol. 22, no. 11, 2015.

[11] T. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker, and A.J. Davison, ”Elasticfusion: Dense slam without a pose graph”, Proc.Robotics: Science and Systems, Rome, Italy, 2015.

[12] R. Kummerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard,”g2o: A general gramework for graph optimization”, in Robotics andAutomation (ICRA), 2011 IEEE International Conference on. IEEE,2011, pp. 3607-3613.

[13] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R.Wheeler, and A. Y. Ng, ”Ros: An open-source robot operating system”,in ICRA workshop on open source software, vol. 3, no. 3.2. Kobe,Japan, 2009, pp. 1-5.

[14] H. Kitano, S. Tadokoro, I. Noda, H. Matsubara, T. Takahashi, A.Shinjou, and S. Shimada, ”Robocup rescue: search and rescue in large-scale disasters as a domain for autonomous agents research”, in IEEESMC 99 Conference Proceedings., vol. 6, 1999, pp. 739-743.

[15] M. A. Mohamed, B. Mertsching, ”TV-L1 optical flow estimation withimage details recovering based on the modified census transform”, inProc. 8th ISVC , 2012, pp. 482-491.

[16] M. A. Mohamed, H.A. Rashwan, B. Mertsching, M.A. Garcia andD. Puig: ”Illumination-robust optical flow using a local directionalpattern”. IEEE Trans. Circuits Syst. Video Technol. 24, 2014, pp. 1-9

[17] M. A. Mohamed, C. Boddeker and B. Mertsching, ”Real-time movingobjects tracking for mobile-robots using motion information”. In: IEEEInternational Symposium on Safety, Security, and Rescue Robotics(SSRR), IEEE, 2014, pp. 1-6.