The Hilti SLAM Challenge Dataset

The Hilti SLAM Challenge Dataset

Michael Helmberger1, Kristian Morin1, Nitish Kumar1, Danwei Wang2, Yufeng Yue3,Giovanni Cioffi4, Davide Scaramuzza4

Abstract— Accurate and robust pose estimation is a funda-mental capability for autonomous systems to navigate, mapand perform tasks. Particularly, construction environments posechallenging problem to Simultaneous Localization and Map-ping (SLAM) algorithms due to sparsity, varying illuminationconditions, and dynamic objects. Current academic researchin SLAM is focused on developing more accurate and robustalgorithms for example by fusing different sensor modalities.To help this research, we propose a new dataset, the HiltiSLAM Challenge Dataset. The sensor platform used to collectthis dataset contains a number of visual, lidar and inertialsensors which have all been rigorously calibrated. All data istemporally aligned to support precise multi-sensor fusion. Eachdataset includes accurate ground truth to allow direct testingof SLAM results. Raw data as well as intrinsic and extrinsicsensor calibration data from twelve datasets in various envi-ronments is provided. Each environment represents commonscenarios found in building construction sites in various stagesof completion.

SUPPLEMENTARY MATERIAL

The dataset as well as the documentation is available athttps://www.hilti-challenge.com.

I. INTRODUCTION

Robots on construction sites promise improved safety ofworkers, task productivity and high quality data capture [1].Although some dangerous tasks (such as concrete chainsawing) are prime work to automate, the more repetitiveand ergonomically difficult tasks (such as overhead drillingand installation) result in many worker injuries. Constructionrobotics offers a way to remove this worker hazard, whilealso improving task scheduling and progress monitoring. Toachieve that goal however, automation requires a wide arrayof technologies and techniques to perceive, map and navigatethrough the environment.

With the introduction of GNSS and INS augmented sys-tems, high performance outdoor positioning and navigationsolutions are widely available [2]. Continuing that trend toindoor or complex outdoor environments however, remainsa challenge. To bridge the gap, autonomous positioningsystems rely on a fusion of sensors and techniques. But thesehardware platforms can be a barrier to high quality researchdue to cost and integration complexity.

*This work was supported by the Hilti AG, Schaan, Liechtenstein1Hilti AG, Schaan, Liechtenstein2School of Electrical and Electrical Engineering, Nanyang Technological

University Singapore3School of Automation, Beijing Institute of Technology, China4Robotics and Perception Group, Dept. of Informatics, University of

Zurich, and Dept. of Neuroinformatics, University of Zurich and ETHZurich, Switzerland

Fig. 1. Left: The full sensor stick resting on a bipod. Right: (1) ADIS16445IMU (2) AlphaSense (3) Ouster OS0-64 lidar (4) Livox MID 70 (5) prismfor total station.

Many types of indoor and outdoor spaces remain sparselyexplored in research due to a lack of reliable test data withaccurate reference information. Previous efforts to collectand distribute high quality multi-sensor data have resulted insignificant improvements and insights in the research areasof visual odometry, SLAM and sensor fusion [3], [4], [5],[6], [7].

In order to support the growing body of experiences invarious indoor and mixed environments, we have created asuite of sensors (representing different categories of sensortechniques) with attention made to precise time synchro-nization and calibration. With this sensor platform, we havecollected datasets in various locations with different practicaldeficiencies seen in real-world scenarios (Fig. 2); along withaccurate ground truth for effective testing and evaluation.Whereas previous datasets have focused on automotive ve-hicle motion, airborne UAV motion, or provided expansivesimulations, this dataset targets real spaces collected by hand-held or robotic-like motion, using the latest in commerciallyavailable sensing technology. With the use of redundantsensors, this dataset also provides a direct comparison ofsensor performance in different environments; which canbe informative for future system designs. We believe thatthese collection scenarios capture a wide array of roboticand reality capture uses cases that many not be as illustrative

arX

iv:2

109.

1131

6v1

[cs

.RO

] 2

3 Se

p 20

21

https://www.hilti-challenge.com

(a) Basement (b) Campus (c) Construction Site (d) IC Office

(e) Lab (f) Office Mitte (g) Parking (h) RPG Tracking Area

Fig. 2. Locations where the datasets have been captured.

with previous data. Our aim is to stimulate research on robustindoor positioning, mapping and navigation with particularapplication to construction environments.

II. SENSOR SETUPOur sensor suite (the ’Phasma’ stick, Fig. 1) consists

of 3 categories of sensing modalities, along with multiplesensors operating with different ranges and noise levels.These include:

A. Passive Visual

AlphaSense by Sevensense1

The visual data is collected from an array of rigidlymounted 1.3MP global shutter cameras. This moduleconsists of 5 wide field-of-view cameras mounted togive an approximate 270 deg continuous field of view.Within this configuration a stereo camera is also present.Images are synchronously collected at 10Hz.

B. Active Optical

Ouster OS0-642

Long range point cloud data is collected by the 360 degscanning lidar sensor. This unit has a scan repetition of10Hz, and a point data rate of 1,300,000 points/second.Ranges are recorded from 0.3 to 50m, with typicallowest noise returns greater than 1m. Range accuracyis 1.5-5cm.

Livox MID703

This unit is a lidar sensor with a 70 deg circular field ofview and a non repeating scan pattern. Point datarate is100,000 points/second. Ranges are recorded from 0.02to 200m, with typical returns between 1 and 50m. Rangeaccuracy is 2-5cm.

1https://www.sevensense.ai/product/alphasense-position2https://ouster.com/products/os0-lidar-sensor/3https://www.livoxtech.com/mid-70

C. Inertial Sensors

Analog Devices ADIS164454

This IMU is rigidly mounted to the AlphaSensemodule. It is a high performing MEMS based sensorwith relatively low noise and sensor bias drift rates.The data from this IMU is tightly timestamped to theAlphaSense timing system. Data is collected at 800Hz.

Bosch BMI0855

This IMU is embedded in the AlphaSense module. Itprovides a modest level of performance in terms ofnoise and bias stability. The data from this IMU istightly timestamped to the AlphaSense timing system.Data is collected at 200Hz.

InvenSense ICM-209486

This IMU is embedded in the Ouster lidar. It provides amore modest level of performance than the ADIS16445in terms of noise and bias stability. The data from thisIMU is tightly timestamped to the Ouster timing system.Data is collected at 100Hz.

D. Ground Truth System

For testing and validation, 2 systems are used to captureground truth:

Total StationA survey grade prism is attached to the Phasma stick.This is tracked by the Hilti PLT3007 automated totalstation. Most datasets are collected in a ”stop ’ngo” fashion, where the total station makes a precise

4https://www.analog.com/en/products/adis16445.html5https://www.bosch-sensortec.com/products/motion-

sensors/imus/bmi085/6https://invensense.tdk.com/products/motion-tracking/9-axis/icm-209487https://www.hilti.com/c/CLS MEA TOOL INSERT 7127

Fig. 3. Time Synchronization between modules. Gyro plots on IMUs align <1ms. Left image shows the angular velocity for the dataset LAB Survey 2,the right image shows a zoom in.

measurement to the prism during the ’stop’ periods.Range measurements to the static prism have 3mmaccuracy. Total station range and angle measurementsare processed to generate XYZ position information.

Optical TrackingOptical tracking targets are attached to the Phasma stick.When operated in a motion capture space, the multipletargets allow for the direct computation of a 6DOF pose.Those datasets have a position accuracy of <1mm andare collected at 200Hz.

III. DATA SYNCHRONIZATION AND LOGGING

In a dynamic multi-sensor system, time synchronizationbetween sensors is critical in order to make best use of thesensor fusion. Special care was given to synchronization inthe Phasma stick to ensure maximum performance:

AlphaSense, Bosch IMU and ADIS IMUThe AlphaSense manages time synchronization at thehardware level via an FPGA implementation. Cameratimes are computed to the mid-exposure pulse (MEP).IMU data is time tagged on arrival to the FPGA databus. Overall time synchronization between the camerasand IMUs is <1ms.

Ouster lidar and Invensense IMUThe Ouster module includes an integrated IMU. TheOuster point data and IMU are hardware synchronizedto the Ouster internal clock. Time synchronizationbetween the two is <1ms.

Cross Module SynchronizationSynchronization between modules (AlphaSense, Ouster,Livox) is provided by the supported PTP network timeprotocol [8]. Each module is attached via wiredEthernet cable to the data logging device, which alsohosts the PTP master clock. With this setup, the time

alignment between the modules is observed to be<1ms, as shown in Fig. 3. For verification we adoptedthe approach from [9] and used optimization tools overthe correlation signal of gyroscope data.

Data logging occurs on a dedicated computer attachedto the Phasma stick. The logging computer runs a Ubuntu18.04 OS with ROS system running during capture. Sensingmodules are connected to the data logger and data streamsare directly recorded in ROS bag files.

IV. CALIBRATION

Along with time synchronization, sensor intrinsic andextrinsic calibrations are critical to achieve the highest sys-tem performance. In our setup, extensive calibration wasundertaken to align the various optical and inertial systems.Intrinsic sensor calibration was conducted by each respectivemanufacturer. For passive camera systems the procedure wasa standard checkerboard calibration; for active systems pro-prietary calibration models were computed and correctionsapplied to the data at the time of capture.

The reference point of the body frame of the Phasma stickis defined at the AlphaSense Bosch IMU centre. All othersensors are transformed back to this point. Spatial offsetswere determined from the CAD model of the Phasma designand refined within a calibration procedure; rotation offsetsbetween sensors were computed in the calibration process.

The extrinsic calibration between the motion capturemarkers and the AlphaSense Bosch IMU was performed byusing the hand eye calibration toolbox [10].

Calibration files, CAD Models and sensor noise parame-ters are provided in the supplementary material. The rosbagscontain the Transformation (TF) Tree [11] with all transfor-mations between the sensors.

V. DATASETS

Data was collected under various conditions with indoorand mixed indoor-outdoor environments. The data shows

TABLE IABSOLUTE TRAJECTORY ERROR IN METERS FOR 3 SLAM ALGORITHMS: ORBSLAM2 [12] USING STEREO IMAGES, A-LOAM [13] USING LIDAR,

AND SVO2.0 [14] USING STEREO IMAGES AND IMU AS INPUT. DEPENDING ON THE SCENE, A DIFFERENT SENSOR IS ADVANTAGEOUS WHICH

HIGHLIGHTS THE NEED FOR FUSION OF ALL SENSING MODALITIES.

Basement 1 Basement 4 Campus 2 Construction Site 2 Lab Survey 2 RPG Drone Testing AreaA-LOAM 0.162 0.288 13.332 6.395 0.088 2.838SVO2.0 0.813 2.598 8.941 2.992 0.082 1.927ORBSLAM2 x 10.306 x x x x

practical challenges in different stages of construction.Challenges include variable lighting, limited features and/orhighly reflective and transparent surfaces.

Data Descriptions (see Fig. 2):

(a) BasementData was collected in a windowless room (approx.20x40m). No natural light; mixed illumination bright-ness. Concrete space with building infrastructure. Base-ment 1 is a short and easy path, and for Basement 3and Basement 4 we mounted the sensor platform on amoving base instead of operating it handheld. Basement3 and 4 also allow to exploit loop closure capabilitiesof the SLAM systems.

(b) CampusData was collected outdoors in a courtyard setting(approx. 40x60m). Good natural lighting with highillumination. Mixed features with building structure andnatural flora.

(c) Construction SiteMostly outdoors with some covered areas (approx.40x80m). Strong natural light with high illumination.Unfinished natural surfaces with limited features abovethe ground plane.

(d) IC OfficeIndoor space with many windows and reflective surfaces(approx. 10x70m). Mix of natural and artificial light.Strong illumination at the windows, modest illuminationindoors.

(e) LabIndoor space dominated by large windows (approx.10x10m). Strong natural light and reflective surfaces.Optitrack 6DOF ground truth.

(f) Office MitteIndoor space in finished office building (approx.30x50m). Mix of natural and artificial light. Lots ofbuilding structure.

(g) ParkingMix of indoor and outdoor space (approx. 100x100m).Parking garage from top floor to lower floor. Lightingvaries from extreme bright to modest darkness. Groundplane structure on top floor; lots of building structureon lower floor.

(h) RPG Tracking AreaIndoor test facility (approx. 30x30m). Mostly artificiallight with some natural. Single large room with randommotion path throughout. Vicon 6DOF ground truth.

VI. DATASET FORMAT

Datasets are stored in binary format as rosbagswhich contain images and IMU measurementsusing the standard sensor msgs/Image andsensor msgs/Imu message types, respectively. TheOuster data uses the sensor msgs/PointCloud2format while the Livox data is stored in the customlivox ros driver/CustomMsg message type to notloose . Fig. 4 shows an example of the camera and lidardata from the Lab Survey 2 dataset. Reference/groundtruth data is given in a separate file for each dataset,with the filename indicating the reference source (e.g.Construction Site prism.txt means the groundtruth is in the prism frame). Rosbag contents are listed inTable II.

VII. EVALUATION

The evaluation of the datasets is based on the absolutetrajectory error (ATE) [15] after SE3 alignment of the groundtruth with the estimated trajectory. The correspondences arematched based on their timestamps. Scenarios collected witha motion capture system are compared with their XYZ posi-tion components only; for the other datasets the intermittentXYZ total station observations are compared. An exampleof a SLAM comparison with motion capture ground truth isdepicted in Fig. 5 for the dataset LAB Survey 2. An exampleof what established approaches can achieve on a few datasetswith total station ground truth is shown in Table I.

ORBSLAM2 (which is using stereo images) is not able toproduce meaningful results in all handheld datasets becauseit immediately looses track of features due to fast rotationalmovements. The only dataset where ORBSLAM2 is not

Fig. 4. Example from the dataset LAB Survey 2: images from 5 camerasand the Ouster OS0 point cloud

failing is Basement 4 where the Phasma stick was mountedon a moving platform. The other tested algorithms, A-LOAM and SVO2.0, are more robust with initialization andtracking. However, depending on the scene texture and scenegeometry, either the lidar based A-LOAM or SVO 2.0 whichis using stereo cameras and the IMU performs better.

Fig. 5. Example comparison of SVO2 [14] using stereo and IMU, LOAM[13] using the Ouster lidar. ORBSLAM2 faild to produce meaningful results.Dataset: Lab Survey 2

VIII. KNOWN ISSUES

Despite careful design and execution of the data collectionexperiments, we are aware of different issues which poseadditional challenges for processing and limit achievableaccuracy when comparing to ground truth. These include:

• Clock Drift and Offset: The clocks from motion-capturesystem and the data logging computer are not hardware-synchronized. We used Ethernet connection and a time-of-arrival time stamping to keep the offset to a mini-mum, however we observed a difference of around 1-3ms in the two clocks.

• Some frames in the lidar and camera data have beendropped due to high load on the controller.

IX. CONCLUSION

In this paper we have described a new public datasetcaptured with a highly redundant multi-sensor platform. Ourgoal is to improve the use of SLAM algorithms in con-struction robotics to assist in task automation and execution.This data captures a series of real-world examples collectedwith current sensing technologies with a high quality timesynchronization. Based on the results we showed, it is clear,that for using SLAM in real world construction use caseslike progress monitoring or surveying, the robustness andaccuracy has to be improved significantly. We hope to expandthis data offerings into various other environments, to furtherspur research on positioning and navigation issues commonlyencountered in indoor and mixed environments.

ACKNOWLEDGMENTS

The authors are grateful for the support of the whole teamfrom Sevensense for their continuous support and helpful

discussions, and IVISO for the calibration verification usingthe tool Camcalib.

REFERENCES

[1] Hilti AG. (2020). “Hilti Jaibot,” [Online]. Available:https://www.hilti.ca/content/hilti/W1/CA/en/business/business/trends/jaibot.html. (accessed: 12.09.2021).

[2] K.-P. Schwarz, M. Chapman, M. Cannon, and P. Gong,“An integrated INS/GPS approach to the georefer-encing of remotely sensed data,” PhotogrammetricEngineering and Remote Sensing, vol. 59, no. 11,pp. 1667–1674, 1993.

[3] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready forautonomous driving? The KITTI Vision BenchmarkSuite,” in Conference on Computer Vision and PatternRecognition (CVPR), 2012.

[4] W. Wang, D. Zhu, X. Wang, Y. Hu, Y. Qiu, C.Wang, Y. Hu, A. Kapoor, and S. Scherer, “Tartanair:A dataset to push the limits of visual SLAM,” in2020 IEEE/RSJ International Conference on Intelli-gent Robots and Systems (IROS), 2020.

[5] M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder,S. Omari, M. W. Achtelik, and R. Siegwart, “The Eu-RoC micro aerial vehicle datasets,” The InternationalJournal of Robotics Research, 2016. DOI: 10.1177/0278364915620033.

[6] J. Delmerico, T. Cieslewski, H. Rebecq, M. Faessler,and D. Scaramuzza, “Are we ready for autonomousdrone racing? the UZH-FPV drone racing dataset,” inIEEE Int. Conf. Robot. Autom. (ICRA), 2019.

[7] E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck,and D. Scaramuzza, “The event-camera dataset andsimulator: Event-based data for pose estimation, visualodometry, and SLAM,” The International Journalof Robotics Research, vol. 36, no. 2, pp. 142–149,Feb. 2017, ISSN: 1741-3176. DOI: 10 . 1177 /0278364917691115.

[8] “IEEE standard for a precision clock synchroniza-tion protocol for networked measurement and controlsystems,” IEEE Std 1588-2008 (Revision of IEEEStd 1588-2002), pp. 1–269, 2008. DOI: 10.1109/IEEESTD.2008.4579760.

[9] M. Faizullin, A. Kornilova, A. Akhmetyanov, andG. Ferrer, “Twist-n-sync: Software clock synchro-nization with microseconds accuracy using MEMS-gyroscopes,” Sensors, vol. 21, no. 1, 2021, ISSN: 1424-8220. DOI: 10.3390/s21010068.

[10] F. Furrer, M. Fehr, T. Novkovic, H. Sommer, I.Gilitschenski, and R. Siegwart, “Evaluation of com-bined time-offset estimation and hand-eye calibrationon robotic datasets,” in Field and Service Robotics:Results of the 11th International Conference, R. Sieg-wart and M. Hutter, Eds. Cham: Springer InternationalPublishing, 2017, ISBN: 978-3-319-67361-5.

https://www.hilti.ca/content/hilti/W1/CA/en/business/business/trends/jaibot.html



https://doi.org/10.1177/0278364915620033

https://doi.org/10.1177/0278364915620033

https://doi.org/10.1177/0278364917691115

https://doi.org/10.1177/0278364917691115

https://doi.org/10.1109/IEEESTD.2008.4579760

https://doi.org/10.1109/IEEESTD.2008.4579760

https://doi.org/10.3390/s21010068

TABLE IITOPICS AND TYPES IN THE ROSBAG

Topic Type Description/alphasense/cam0/image raw sensor msgs/Image front facing camera 1/alphasense/cam1/image raw sensor msgs/Image front facing camera 2/alphasense/cam2/image raw sensor msgs/Image upward facing camera/alphasense/cam3/image raw sensor msgs/Image right facing camera/alphasense/cam4/image raw sensor msgs/Image left facing camera/alphasense/imu sensor msgs/Imu Bosch IMU, 200Hz/alphasense/imu adis sensor msgs/Imu ADIS16446, 800Hz/livox/lidar livox ros driver/CustomMsg Livox MID70/os cloud node/imu sensor msgs/Imu Ivensense, 100Hz/os cloud node/points sensor msgs/PointCloud2 Ouster OS0-64tf static tf2 msgs/TFMessage all transforms between frames

[11] T. Foote, “tf: The transform library,” in 2013 IEEEConference on Technologies for Practical Robot Ap-plications (TePRA), IEEE, Apr. 2013. DOI: 10 .1109/tepra.2013.6556373.

[12] R. Mur-Artal and J. D. Tardos, “ORB-SLAM2: Anopen-source SLAM system for monocular, stereo andRGB-D cameras,” IEEE Transactions on Robotics,vol. 33, no. 5, pp. 1255–1262, 2017. DOI: 10.1109/TRO.2017.2705103.

[13] J. Zhang and S. Singh, “LOAM: Lidar odometry andmapping in real-time,” Robotics: Science and SystemsConference (RSS), pp. 109–111, Jan. 2014.

[14] C. Forster, Z. Zhang, M. Gassner, M. Werlberger, andD. Scaramuzza, “SVO: Semidirect visual odometry formonocular and multicamera systems,” IEEE Transac-tions on Robotics, vol. 33, no. 2, pp. 249–265, 2017.DOI: 10.1109/TRO.2016.2623335.

[15] Z. Zhang and D. Scaramuzza, “A tutorial on quanti-tative trajectory evaluation for visual(-inertial) odom-etry,” in 2018 IEEE/RSJ International Conferenceon Intelligent Robots and Systems (IROS), 2018,pp. 7244–7251. DOI: 10 . 1109 / IROS . 2018 .8593941.

https://doi.org/10.1109/tepra.2013.6556373

https://doi.org/10.1109/tepra.2013.6556373

https://doi.org/10.1109/TRO.2017.2705103

https://doi.org/10.1109/TRO.2017.2705103

https://doi.org/10.1109/TRO.2016.2623335

https://doi.org/10.1109/IROS.2018.8593941

https://doi.org/10.1109/IROS.2018.8593941

Documents

The Hilti SLAM Challenge Dataset