6
Speeding Up Rao-Blackwellized Particle Filter SLAM with a Multithreaded Architecture Bruno D. Gouveia, David Portugal and Lino Marques Abstract— In this work we explore multiprocessor computer architectures to propose an effective method for solving the Simultaneous Localization and Mapping Problem. The pro- posed method makes use of multithreading to parallelize a Rao-Blackwellized Particle Filter approach. By applying the method in common computers found in robots, it is shown that a significant gain in efficiency can be obtained. Furthermore, the parallel method enables us to raise the number of particles up to values that would not be possible in a single threaded solution, thus gaining in localization precision and map accuracy. In order to analyze SLAM results, frequently used datasets by the robotics community were used, and a benchmarking metric was applied. I. I NTRODUCTION Simultaneous Localization and Mapping (SLAM) has gained increasing attention over the years, and today several approaches exist to address the problem with different levels of success. Despite the scientific advancements in this area, they usually accompany technological innovations, e.g., 2D SLAM reached higher levels with the introduction of 2D laser range finders (LRFs), intense research into Visual SLAM has been noted with the growing use of stereo cameras, and RGB-D SLAM research has been mostly driven by the proliferation of RGB-D sensors like the Microsoft Kinect. Still, modern day computer architectures have not been fully explored in this context and the absence of closed SLAM solutions based on parallel computing techniques is surprising. Beyond proposing new techniques to reduce sensor mea- surement noise and produce accurate representations, roboti- cists have increasingly made an effort to use efficient data structures in their methods, such as pose graphs, to reduce memory and processing requirements. In this paper, we propose to leverage the distinct processing units that the vast majority of modern computers have by distributing the computation load of a Rao-Blackwellized Particle Filter (RBPF) SLAM approach. Besides presenting a new parallel SLAM method, which leads to a more balanced utilization of the processing resources, we provide a performance and CPU load analysis on a modern single-board computer (SBC) and on a commonly used netbook with a limited computational capability. We show that the approach developed enables us to increase the overall CPU utilization when compared This work was partially carried out in the framework of TIRAMISU project (www.fp7-tiramisu.eu). This project is funded by the Euro- pean Community’s Seventh Framework Program (FP7/SEC/284747). B. Gouveia, D. Portugal and L. Marques are with the Institute of Systems and Robotics, Dept. Electrical and Computers Eng., University of Coimbra, 3030-290 Coimbra {bgouveia,davidbsp,lino} at isr.uc.pt to classical single threaded solution, therefore increasing localization and mapping performance. In the next section, seminal work on SLAM and efficient particle filter methods is reviewed, and in Section III, the multithreaded approach proposed is described. Afterwards, in section IV we discuss the results of applying the approach described in this paper, which is compared against a single threaded solution. Finally, the article ends with conclusions. II. RELATED WORK In the field of mobile robotics, solutions for the problem of SLAM rely on probabilistic frameworks, which account for sensor measurement noise and estimation uncertainty. Classical techniques make use of Kalman Filters (KFs) [1] and Particle Filters (PFs) [2] to incrementally compute joint posterior distributions over robot poses and landmarks. Along the years, several enhancements based on these ap- proaches have been described in the literature, such as Extended Kalman Filters (EKFs) and Rao-Blackwellized Particle Filters (RBPFs). In addition, graph-based algorithms, e.g. [3], have also gained popularity due to the efficiency gained when maintaining large-scale maps, which stems from discarding irrelevant measurements, and adopting graph optimization processes. Also popular nowadays are methods which take advantage of high scanning rates of modern day Light Detection And Ranging (LIDAR) technology. These methods rely heavily on scan matching of consecutive sensor readings, with com- bination of other techniques like multi-resolution occupancy grid maps [4] or dynamic likelihood field models for mea- surement [5]. Despite the evident advancements in research in the SLAM problem, a robot with such capabilities still has to be equipped with a modern computer to adequately handle the processing and memory requirements. Beyond the usage of optimized data structures and algo- rithms, some authors have turned to modern day computer architectures to handle heavy processing SLAM in real time. Multicore processors are more efficient in terms of energetic consumption since the speed requirements on each parallel unit are reduced, allowing for a reduction in voltage. For example, the authors in [6] propose a simple and effective scan-matching module that can be potentially used in several SLAM algorithms, which takes advantage of the Single Instruction, Multiple Data (SIMD) units available in modern processors. This allows the processor to execute an operation over a data vector in the same instruction time. Using this

Speeding up Rao-Blackwellized Particle Filter SLAM with a

Embed Size (px)

Citation preview

Speeding Up Rao-Blackwellized Particle Filter SLAMwith a Multithreaded Architecture

Bruno D. Gouveia, David Portugal and Lino Marques

Abstract— In this work we explore multiprocessor computerarchitectures to propose an effective method for solving theSimultaneous Localization and Mapping Problem. The pro-posed method makes use of multithreading to parallelize aRao-Blackwellized Particle Filter approach. By applying themethod in common computers found in robots, it is shown thata significant gain in efficiency can be obtained. Furthermore, theparallel method enables us to raise the number of particles up tovalues that would not be possible in a single threaded solution,thus gaining in localization precision and map accuracy. Inorder to analyze SLAM results, frequently used datasets bythe robotics community were used, and a benchmarking metricwas applied.

I. INTRODUCTION

Simultaneous Localization and Mapping (SLAM) hasgained increasing attention over the years, and today severalapproaches exist to address the problem with different levelsof success. Despite the scientific advancements in this area,they usually accompany technological innovations, e.g., 2DSLAM reached higher levels with the introduction of 2Dlaser range finders (LRFs), intense research into VisualSLAM has been noted with the growing use of stereocameras, and RGB-D SLAM research has been mostly drivenby the proliferation of RGB-D sensors like the MicrosoftKinect. Still, modern day computer architectures have notbeen fully explored in this context and the absence of closedSLAM solutions based on parallel computing techniques issurprising.

Beyond proposing new techniques to reduce sensor mea-surement noise and produce accurate representations, roboti-cists have increasingly made an effort to use efficient datastructures in their methods, such as pose graphs, to reducememory and processing requirements. In this paper, wepropose to leverage the distinct processing units that thevast majority of modern computers have by distributingthe computation load of a Rao-Blackwellized Particle Filter(RBPF) SLAM approach. Besides presenting a new parallelSLAM method, which leads to a more balanced utilization ofthe processing resources, we provide a performance and CPUload analysis on a modern single-board computer (SBC) andon a commonly used netbook with a limited computationalcapability. We show that the approach developed enablesus to increase the overall CPU utilization when compared

This work was partially carried out in the framework of TIRAMISUproject (www.fp7-tiramisu.eu). This project is funded by the Euro-pean Community’s Seventh Framework Program (FP7/SEC/284747).

B. Gouveia, D. Portugal and L. Marques are with the Institute ofSystems and Robotics, Dept. Electrical and Computers Eng., University ofCoimbra, 3030-290 Coimbra {bgouveia,davidbsp,lino} atisr.uc.pt

to classical single threaded solution, therefore increasinglocalization and mapping performance.

In the next section, seminal work on SLAM and efficientparticle filter methods is reviewed, and in Section III, themultithreaded approach proposed is described. Afterwards,in section IV we discuss the results of applying the approachdescribed in this paper, which is compared against a singlethreaded solution. Finally, the article ends with conclusions.

II. RELATED WORK

In the field of mobile robotics, solutions for the problemof SLAM rely on probabilistic frameworks, which accountfor sensor measurement noise and estimation uncertainty.Classical techniques make use of Kalman Filters (KFs)[1] and Particle Filters (PFs) [2] to incrementally computejoint posterior distributions over robot poses and landmarks.Along the years, several enhancements based on these ap-proaches have been described in the literature, such asExtended Kalman Filters (EKFs) and Rao-BlackwellizedParticle Filters (RBPFs). In addition, graph-based algorithms,e.g. [3], have also gained popularity due to the efficiencygained when maintaining large-scale maps, which stemsfrom discarding irrelevant measurements, and adopting graphoptimization processes.

Also popular nowadays are methods which take advantageof high scanning rates of modern day Light Detection AndRanging (LIDAR) technology. These methods rely heavilyon scan matching of consecutive sensor readings, with com-bination of other techniques like multi-resolution occupancygrid maps [4] or dynamic likelihood field models for mea-surement [5]. Despite the evident advancements in researchin the SLAM problem, a robot with such capabilities stillhas to be equipped with a modern computer to adequatelyhandle the processing and memory requirements.

Beyond the usage of optimized data structures and algo-rithms, some authors have turned to modern day computerarchitectures to handle heavy processing SLAM in real time.Multicore processors are more efficient in terms of energeticconsumption since the speed requirements on each parallelunit are reduced, allowing for a reduction in voltage. Forexample, the authors in [6] propose a simple and effectivescan-matching module that can be potentially used in severalSLAM algorithms, which takes advantage of the SingleInstruction, Multiple Data (SIMD) units available in modernprocessors. This allows the processor to execute an operationover a data vector in the same instruction time. Using this

approach, an average speedup1 of 3.5 compared to the non-SIMD version of the algorithm was observed. Additionally,the authors in [7] have proposed a Visual SLAM moduleimplemented in the Compute Unified Device Architecture(CUDA), which runs at a 15 Hz rate. Their approach per-forms sparse scene flow, real-time feature tracking, visualodometry, loop detection and global mapping in parallel.

Some effort has also been conducted to efficiently dis-tribute the computation load of classical particle filter imple-mentations on multiprocessing systems in a variety of appli-cations. Chitchian et al. [8] have proposed a NVIDIA Graph-ics Processing Unit (GPU) implementation of a particle filterframework for complex real-time estimation problems, usingboth CUDA and the Open Computing Language (OpenCL).This was applied in a complex robot arm application with48 state variables and using a classical particle filteringapproach with 1 million particles. The authors conclude thatit would not be possible to run the system in real-timewithout resorting to the new generation of GPUs with severalcores. Similarly, Par and Tosun [9] have addressed classicalPF-based Sequential Monte Carlo Localization of a vehicle ina known map with multicore and manycore processors. Theyhave used the Open Multi-Processing (OpenMP) program-ming model for the parallelization of the predict and updatephases of the PF in a multicore CPU, and also implementeda GPU version, obtaining speedup values of up to 4.7 and 75respectively. Perhaps, the work most closely related to ours isdescribed in [10], where a parallel implementation of a RBPFwas also addressed. The authors not only propose to assigndifferent particles among available processing elements inorder to compute their weights, but also to conduct theresampling process. Despite the potential of the work, it wasonly validated by tracking a maneuvering vehicle in Matlabsimulations.

Other related philosophies for distributing the computationload in robotic application are emerging. These are the casesof Cloud Robotics and Robotic Clusters. On one hand, in[11] robots performing visual SLAM query a Cloud serviceto run expensive steps of the algorithm and allocate storagespace, thus freeing the robot embedded computers from mostof the computation effort. However, this implies the contin-ued availability of the Cloud entity. On the other hand, [12]empowers heterogeneous robots with the ability of sharingtheir processing resources when solving complex collectiveproblems. This was applied to the problem of topologicalmap merging in a distributed multi-robot system. Thesephilosophies differ from the one presented in this paper inthe sense that they imply communication to/from externalprocessing units, and network latency is non negligible inreal-time tasks. Also, our approach does not suffer fromnetwork failures, since all processing units belong to thesame computing component.

In contrast to all previously mentioned works, we leveragenot only modern day computer architectures composed of

1In parallel computing, speedup refers to how much a parallel algorithmis faster than a corresponding sequential algorithm.

First Scan ? RegisterScan

N

Y

N

UpdateTreeWeights

UpdateMotionModel

ScanMatch

Resample

Y

UpdateTreeWeights

Start

End

End

End

linear distance > ɣ

or angular

distance > Θ ?

Fig. 1: Flowchart of the ProcessScan function of the GMap-ping algorithm, which is called every time a new scan isretrieved.

multiple CPU units by presenting a multithreaded RBPFapproach, but also apply it in a 2D SLAM application forthe first time as far as our knowledge goes. Furthermore, wecontribute to the state-of-the-art by quantitatively analyzingthe benefits of such approach in commonly used and well-known SLAM datasets.

III. MULTITHREADED RBPF SLAM

Recognized as the most widely used laser-based SLAMalgorithm in robots worldwide, GMapping is an implementa-tion of the RBPF SLAM approach presented by Grisetti et al.[13]. Being also a grid-based algorithm, it maintains an oc-cupancy grid map divided in cells, which represent whetherthe state of the corresponding space is occupied (e.g., anobstacle), free (open space) or unknown. Furthermore, sinceit uses a PF implementation, each particle encodes a pose ofthe robot and the corresponding laser scan, thus providingan hypothesis to the localization and mapping problem at agiven time step.

There are several advantages for choosing GMapping asthe key SLAM approach addressed in this study. Firstly, it isan open source ready-to-use algorithm available in the RobotOperating System (ROS).2 Secondly, it is well-established

2http://wiki.ros.org/slam_gmapping

(a) Single threaded GMapping with N = 5.

(b) Single threaded GMapping with N = 40.

Fig. 2: CPU load in the single threaded GMapping algorithmrunning on an Asus EeePC 1015 with different number ofparticles N .

with recognized performance as proven by recent works, e.g.[14]. Finally, the nature of the algorithm is particularly suit-able for parallelization, given that each particle is computedindependently and there is no data flow between them. Formore details on the algorithm the reader should refer to [13].

Every time a new laser scan is acquired by the robot, thealgorithm follows all the steps illustrated in the flowchartof Fig. 1. Using a profiling tool,3 it was possible to verifythat on average, 98.47% of the computation time of theProcessScan function is spent in the Scan Matching step.This occurs because scan matching involves comparing theset of data returned from the LRF with the 2D map ob-tained thus far, which is a process that is repeated for allthe N particles involved. Therefore, it is executed manytimes during localization and involves several mathematicaloperations, representing a high computational burden for anySLAM algorithm. Additionally, it should be noted that scanmatching occurs when the distance traveled by the robotsurpasses a predetermined linear threshold γ or an angularthreshold θ (cf. Fig. 1), which are important parameters ofthe algorithm. Hence, we expect that frequent computationpeaks will occur every time the condition is verified. Thisis in fact shown in Fig. 2a, where a run of the algorithm inan Asus eeePC 1015 netbook with N = 5, γ = 1.0m andθ = 0.5 rad is depicted.

While in Fig. 2a the algorithm is run with a low numberof particles, in Fig. 2b we present a similar chart withN = 40. In this situation, one cannot distinguish thecomputational peaks, because scan matching takes too long.

3Callgrind, available in the Valgrind distribution:http://valgrind.org

Fig. 3: CPU load in the Multithreaded GMapping algorithmwith N = 40 running on an Asus EeePC 1015.

Therefore, when it is necessary to proceed with the next scanmatches, the previous ones still have not finished processing,and consequently, the algorithm skips steps. This has atremendous impact on localization and mapping, as shownlater on. Thus, in this case the algorithm will only run in real-time if the robot moves slowly, giving the computer enoughtime to process laser scan matching. Despite the computerbeing subjected to overload, it is clear that the computationalresources are not being efficiently used, since the totalcomputational load never exceeds 50%. The reason behindthis is that the algorithm was not designed to run in parallel.More particularly, only one processing unit is scheduled forthe heavy scan matching task. Thus, we propose to distributethis task among the existing processing units.

As previously mentioned, after conducting a data depen-dence analysis it was verified that laser scan matching iscomputed independently for each particle, which representsthe best case scenario for parallelization, since there is noneed to share data between particles. This is known as anembarrasingly parallel problem. This study not only led us toparallelize this step, but also to keep the other steps as singlethreaded, since the overhead of parallelization together withtheir low computational demand would render the paralleliza-tion inefficient in these steps. Thus, our method divides theN particles in equally sized chunks and distributes them overthe available processor cores while keeping the rest of thealgorithm single threaded.

In order to implement the parallel architecture, we haveused OpenMP, by directly modifying the original GMap-ping package available for the ROS framework. OpenMPis supported by a large number of compilers on differentplatforms, providing a simple and elegant solution to thesingle threaded limitation of the original algorithm, withoutthe need to restructure and modify the existing code or useexternal libraries. This approach has the ability to autodetectthe number of available processor cores, also providing anoption to choose the number of worker threads. This way,the algorithm adapts itself to the host architecture, making anefficient use of the processing resources, as illustrated by Fig.3. Furthermore, this approach is not limited to GMapping,having also the potential to be used in any SLAM algorithmthat relies on laser scan matching in a similar particle-independent way.

The benefits of using a multithreaded version of the

TABLE I: Specifications of the Asus EeePC netbook and theOdroid X2 single-board computer used in the experiments.

TestedComputers

Asus EeePC 1015 PEM Odroid X2

CPU Intel AtomTM N550with Hyper-Threading

Exynos4412 PrimeARM Cortex-A9

Quad Core

# Cores 2 Physical (4 Virtual)Cores 4 Physical Cores

Clock Speed 1.5 GHz 1.7 GHzCache L2 1 MB 1 MB

RAM 1GB (DDR3) 2GB (DDR2)Storage 250GB, 5.400rpm 16GB, SD card

OperatingSystem Ubuntu 12.04, 64 bit Ubuntu-based Linaro

12.11, 32 bitApproximate

Cost $289 $135

GMapping algorithm are analyzed and discussed in the nextsection. Also, the code of this work is publicly available fordownload.4

IV. RESULTS AND DISCUSSION

Experimental results are presented and discussed in thissection. We propose to analyze the efficiency and perfor-mance gain when using multithreaded (MT) GMapping asproposed in this paper, comparing it to the single threaded(ST) one, which is available for ROS and widely used by thecommunity. Having this in mind, we have tested 3 datasetstypically used in SLAM benchmarking: ACES Building,Intel Research Lab, and Killian Court.5 These were testedin the two computer architectures depicted in Table I, bothin ST and MT mode. In addition, we have also conductedexperimental tests, increasing the number of particles by 30,starting with N = 30 and going up to N = 180, whilestopping at the point where GMapping returns an “out ofmemory” error. For each different configuration we run 5trials, resulting in a total of 250 trials of the followingcombinations:

1) dataset = {ACES Building, Intel Research Lab,Killian Court},

2) N = {30, 60, 90, ...},3) method = {ST, MT},4) computer = {Asus EeePC, Odroid X2}.We start by analyzing the average time taken for the Scan

Matching step of the algorithm in both the ST and MTmethod. Table II presents the average time in seconds overall of the 5 trials with different configuration. Note that thereare no results for the Asus EeePC 1015 for N > 90 in theACES Building and N > 60 in the Killian Court datasets,due to the 1GB memory limitation. On the other hand, theOdroid X2 was able to handle N ≤ 150 and N ≤ 90 forthe ACES Building and Killian Court datasets respectively.

4https://github.com/brNX/gmapping-openmp5http://kaspar.informatik.uni-freiburg.de/

˜slamEvaluation/datasets.php

TABLE II: Average Laser Scan Matching Processing Times(in seconds) over 5 trials for each different combination{dataset,N,method, computer}.

Asus EeePC Odroid X2Particles

(N)ST MT ST MT

ACESBuilding,Austin

≈ 56× 58 (m)

30 0.917679 0.476003 0.691008 0.31031060 1.784522 0.877043 1.365558 0.58085190 2.661630 1.286834 2.044120 0.846713120 - - 2.740810 1.104417150 - - 3.438892 1.360284

Intel ResearchLab, Seattle≈ 28.5× 28.5

(m)

30 1.007176 0.502663 0.735295 0.31315760 2.020911 0.947686 1.475900 0.59518890 3.010757 1.380351 2.201875 0.868716120 3.976068 1.804077 2.915877 1.128323150 5.007103 2.219280 3.657072 1.390523180 6.389959 2.724606 4.382634 1.643050

Killian Court,MIT

≈ 245× 200 (m)

30 0.844373 0.447086 0.665588 0.29770060 1.747692 0.867257 1.329213 0.56041090 - - 2.023825 0.829190

1

On the Intel Research Lab, both computers were able to runexperiments with 30 ≤ N ≤ 180.

The first immediate evidence is that scan matching takesless time to process in the Odroid X2 than in the AsusEeePC, as expected due to the former’s superior computationpower. Also, it is clear that multithreading accelerates thescan matching process by an approximate factor of 2. Infact, the acceleration observed in the Odroid X2 presentsan acceleration factor between [2.22, 2.68], while the oneobserved in the EeePC has a factor between [1.93, 2.34].The efficiency in the Odroid X2 is superior, possibly due toits more optimized architecture with 4 de facto processors,whereas the EeePC only has 2 real cores (4 threads).

As a consequence of the above facts, it is possible torun a MT method with several more particles than a STmethod in the same computer, thus gaining in localizationand mapping quality. A clear example of this is shown inthe results for the ACES Building in the Odroid X2, wherea ST method with 60 particles takes approximately the sameamount of time to process laser scan matching as the MTmethod with 150 particles. Furthermore, since multithreadingaccelerates the scan matching step it enables the robot thatis collecting data to move at higher speeds. When moving athigher speeds, the algorithm reaches the γ or the θ thresholdquicker. Performing SLAM in real-time without droppingdata is possible as long as the processing time is shorterthan the time between the thresholding condition is verified.

In the analysis presented before, we have only addressedthe time to process laser scan matching, which was theonly step of the algorithm that was parallelized. However,it is important to understand how the overall time to runthe algorithm has been accelerated with a multithreadingsolution. In parallel computing, it is common to use thespeedup metric to measure how fast a parallel algorithm is,when compared to the corresponding sequential algorithm.Thus, speedup υ is defined as:

υ =TST

TMT, (1)

1,3

1,5

1,7

1,9

2,1

2,3

2,5

2,7

30 60 90 120 150 180

Spe

ed

up

Number of Particles (N)

Odroid (Intel)

Odroid (Killian)

Odroid (Aces)

EeePC (Intel)

EeePC (Killian)

EeePC (Aces)

Fig. 4: Speedup obtained using the multithreaded method.

where TST represents the total processing time of the singlethreaded method, and TMT the total processing time ofthe multithreading method in the same exact conditions. InFig. 4, the evolution of the speedup metric with increasingnumber of particles is shown. The illustrated curves confirmthat the efficiency gain is higher when using the OdroidX2 single-board computer, where υ ∈ [2.08, 2.60], whilethe Asus EeePC 1015 has lower speedup, υ ∈ [1.45, 2.20].Moreover, speedup consistently increases with the processingload, i.e. the number of particles N . This is common in mul-tithreading, because the overhead of parallelization, such asthe creation and management of threads, is approximately thesame, independently of the size of the problem. Therefore,the ratio between overhead and workload decreases, when theprocessing load grows. As a result, the efficiency increaseswith the size of the problem.

The efficiency analysis that was conducted is fundamentalto assess our method. However, it is also important todiscuss the impact of the multithreading algorithm in terms ofperformance. Particularly in SLAM, performance is relatedwith localization accuracy and mapping precision.

Due to the acceleration observed in the scan matching step,it has been shown that the multithreading algorithm is ableto handle a number of particles that cannot necessarily behandled in single threading without dropping data, e.g. Fig.2b and Fig. 3. Hereupon, the performance obtained with theMT approach is expected to be superior in situations wherethe ST approach skips scan matching steps, having at leastthe same performance levels when this is not the case. Thisis clearly noticed in the ACES building using 90 particles,and in the Intel Research Lab using 150 particles (cf. Figures5a, 5b, 6a , and 6b).

Despite the evident increase of mapping quality with mul-tithreading, visually inspection of the resulting maps does notallow a correct comparison. So, the need to precisely evaluatethe results asks for a more accurate method - a quantitativescale. In this work, we make use of the benchmarking metricpresented in [15] to understand the impact of multithreadingin SLAM performance. This metric evaluates the accuracyof the poses in robot trajectories during data acquisition.It uses only relative relations between poses and does notrely on a global reference frame, which even allows tocompare algorithms with different estimation techniques and

sensor modalities. Moreover, using this metric, the erroris not cumulative, instead it is isolated by comparing thedisplacement of the differences between estimated poses ofthe robot to the ground truth relation between real poses,which were manually measured by the authors in [15] forcommonly used datasets.

Taking the example of the maps in Figures 5 and 6, thetranslational error and the angular error were calculated. Inthe rightmost charts (cf. Figures 5c and 6c), one can see theevolution of the translational error over the different rela-tions, and Table III presents the mean error in translation androtation with the corresponding standard deviation. Resultsdemonstrate that the errors of the MT method are smallerthan those of the ST method, both for the translationaland rotational error. The peaks in the evolution chart showsituations where relations measured by the ST method havean unusual high error, which explains why the resultingmaps generated were only consistent locally, but tend to beinconsistent as a whole.

V. CONCLUSIONS

This work presents a multithreaded architecture to speedup a 2D RBPF SLAM approach, widely used by the Roboticscommunity. We started by conducting an initial study wherewe identified scan matching as the most costly step ofthe algorithm in terms of computation. After describingthe implementation details, several experimental tests wereconducted using two different computer architectures andrunning the RBPF algorithm with increasing number ofparticles in commonly used benchmarking datasets. We haveanalyzed and compared the single threaded approach againstour multithreaded implementation, and have shown that theprocessing time drastically decreases, as the computationalresources are used in a much more efficient way. Further-more, the multithreaded architecture allows us to increasethe number of particles in the algorithm by a factor of 2or even more. Besides the gain in computation efficiency,we have also compared SLAM performance in both meth-ods by adopting a recognized benchmarking metric by thecommunity. Results have shown that multithreading leadsto increases in localization and mapping when the numberof particles cannot be fully handled by the single threadedmethod.

In addition to the method and the quantitative analysispresented, the authors openly provide the code to the com-munity in the hope that it will be useful. Even though othermethods were not tested in this paper, the authors suggestthat the approach can be used in the future to speed up anySLAM algorithm that relies in laser scan matching.

ACKNOWLEDGMENT

The authors would like to thank Eurico Pedrosa andNuno Lau (University of Aveiro, Portugal) for the codethat converts the log files of frequently used datasets in therobotics community to ROS.

(a) ST GMapping with N = 90. (b) MT GMapping with N = 90.

0

5

10

15

20

25

30

35

40

45

50

0 250 500 750 1000

Tran

slat

ion

al E

rro

r (m

)

# Relations

TranslationalError (MT)

TranslationalError (ST)

(c) Translational Error with N = 90.

Fig. 5: Maps obtained and translational error using the single threaded and the multithreaded method in the ACES Buildingdataset.

(a) ST GMapping with N = 150. (b) MT GMapping with N = 150.

02468

101214161820

0 250 500 750 1000 1250 1500 1750

Tran

slat

ion

al E

rro

r (m

)

# Relations

TranslationalError (MT)

TranslationalError (ST)

(c) Translational Error with N = 150.

Fig. 6: Maps obtained and translational error using the single threaded and the multithreaded method in the Intel ResearchLab dataset.

TABLE III: Mean translational error εtrans and mean rotational error εrot.

ST MTεtrans (m) εrot (deg) εtrans (m) εrot (deg)

ACES Building (N = 90) 0.197 ± 2.107 3.754 ± 13.099 0.052 ± 0.127 1.222 ± 1.662Intel Research Lab (N = 150) 0.102 ± 0.760 14.476 ± 21.991 0.029 ± 0.042 2.947 ± 2.415

REFERENCES

[1] G. Dissanayake, P. Newman, S. Clark, H.F. Durrant-Whyte, M. Csorba,“A Solution to the Simultaneous Localization and Map Building(SLAM) Problem”. In IEEE Transactions on Robotics and Automation,17 (3), June 2001.

[2] M. Montemerlo, S. Thrun, D. Koller, B. Wegbreit, “FastSLAM: AFactored Solution to the Simultaneous Localization and MappingProblem”. In AAAI National Conf. on Artificial Intelligence, 2002.

[3] S. Thrun, M. Montemerlo, “The GraphSLAM Algorithm With Appli-cations to Large-Scale Mapping of Urban Structures”. In Proc. of theInt. Journal on Robotics Research, 2005.

[4] S. Kohlbrecher, J. Meyer, O. Von Stryk, U. Klingauf, “A Flexible andScalable SLAM System with Full 3D Motion Estimation”. In Proc. ofthe Int. Symp. on Safety, Security and Rescue Robotics, Nov. 2011.

[5] E. Pedrosa, N. Lau, A. Pereira, “Online SLAM Based on a Fast Scan-Matching Algorithm”. In Progress in Artificial Intelligence, LectureNotes in Computer Science, Editors: L. Correia, L. P. Reis, J. Cascalho,Vol. 8154, 295-306, Springer, 2013.

[6] O. El Hamzaoui, B. Steux, “A Fast Scan Matching for Grid-basedLaser SLAM using Streaming SIMD Extensions”. In Proc. of the Int.Conf. Control, Automation, Robotics and Vision, Singapore, Dec. 2010.

[7] B. Clipp, J. Lim, J. Frahm, M. Pollefeys, “Parallel, Real-Time VisualSLAM”. In Proc. of the 2010 Int. Conf. on Intelligent Robots andSystems (IROS 2010), 3961-3968, Taipei, Taiwan, October 2010.

[8] M. Chitchian, A.S. van Amesfoort, A. Simonetto, T. Keviczky, H.J.Sips, “Particle Filters on Multi-Core Processors”. Delft University ofTechnology, The Netherlands, Tech. Report PDS-2012-001, Feb. 2012.

[9] K. Par, O. Tosun, “Parallelization of Particle Filter based Localizationand Map Matching Algorithms on Multicore/Manycore Architectures”.In Proc. of IEEE Intelligent Vehicles Symp., Baden, Germany, 2011.

[10] M.S. Miah, M. Bolic, “Parallel Implementation of Modified Rao-Blackwellised Particle Filter”, Computer Architecture Research Group,University of Ottawa, Technical Report, 2009.

[11] L. Riazuelo, J. Civera, J.M.M. Montiel, “C2TAM: A Cloud frameworkfor cooperative tracking and mapping”. In Robotics and AutonomousSystem, 2014.

[12] A. Marjovi, S. Choobdar, L. Marques, “Robotic Clusters: Multi-robot systems as computer clusters - A topological map mergingdemonstration”. In Robotics and Autonomous System, Vol. 60, 2012.

[13] G. Grisetti, C. Stachniss, W. Burgard, “Improved Techniques forGrid Mapping With Rao-Blackwellized Particle Filters”. In IEEETransactions on Robotics, 23(1), Feb. 2007.

[14] J. Machado Santos, D. Portugal and R. P. Rocha, “An Evaluation of 2DSLAM Techniques Availabile in Robot Operating System”. In Proc.of the 2013 Int. Symposium on Safety, Security and Rescue Robotics(SSRR 2013), Linkoping, Sweden, Oct 21-26, 2013.

[15] R. Kummerle, B. Steder, C. Dornhege, M. Ruhnke, G. Grisetti,C. Stachniss, A. Kleiner, “On Measuring the Accuracy of SLAMAlgorithms”. In Autonomous Robots, 27(4), 2009.