[IEEE 2008 IEEE/ION Position, Location and Navigation Symposium - Monterey, CA, USA (2008.05.5-2008.05.8)] 2008 IEEE/ION Position, Location and Navigation Symposium - Sensor fusion

Sensor Fusion for GNSS Denied Navigation

Kailash Krishnaswamy*, Sara Susca*, Rob McCroskey*, Pete Seiler*, Jan Lukas†, Ondrej Kotaba†, Vibhor Bageshwar*, Subhabrata Ganguli*

Honeywell Advanced Technology, * 3660 Technology Dr, Minneapolis, MN 55418

† Holandska 2 & 4, Brno, MI 63900 CZE

Abstract We present technologies that are being developed to address the need for a navigation solution in the absence of Global Navigation Satellite Systems (GNSS) measurements. The navigation system uses sensors such as vision systems, RADARS and LIDARS with feature extraction, matching and motion estimation algorithms. We present experimental results of using Scale Invariant Feature Transform, Speeded Up Robust Features, and modified Harris feature extraction algorithms and compare the performance. We also present methods to extract lines and planes that can aid in navigation. For motion estimation we present results for Visual Odometry as well as Simultaneous Localization and Mapping navigation. We experimentally verify the algorithms in both a realtime linux framework as well as offline. We also present ongoing work in vision integrated navigation in an Attitude and Heading Reference System as well as an extended Kalman filter framework. All the methods we present in this paper are incremental navigation methods.

1. INTRODUCTION Increasing applications are seeking navigation solutions that do not rely on GNSS for periodic aiding. Personal Indoor Navigation and Urban Navigation are two examples. Personal navigation requires localization aiding in indoor environments usually for dismounted personnel. The objective is to track one’s position w. r. t. other likewise equipped personnel in closed and obstructed spaces. In this application, a navigation system could be without GNSS completely over extended periods of time. If triangulation using radio beacons is the mode of localization, the navigation system may have intermittent localization failures due to multipath or severe obstructions such as concrete walls. Urban navigation requires localization aiding that prevents an IMU from drifting during intermittent GNSS outages. The objective is to localize accurately w.r.t. the cluttered local environment due to either multi-path issues or constellation occlusions. In such an application, a navigation system should be able to detect GNSS intermittency and blend other aids appropriately. We would like to differentiate between two types of navigation, Incremental and Absolute. Incremental

navigation is the ability to localize (position and attitude) with sensors that measure incremental motion. Sensors such as gyros, accelerometers, and odometers are used to mechanize incremental navigation systems. Absolute navigation is the ability to localize (position and attitude) with sensors that measure the localization state directly. Sensors such as Global Navigation Satellite Systems (GNSS), magnetometers, and radio beacons are used to mechanize absolute navigation systems. The utility of incremental navigation is limited due to the drift in the localization estimate. In most applications, localization would need to be drift-less to be useful. Hence, the benefit and need for absolute navigation sensors to aid the Incremental navigation sensors. However, absolute position sensors, such as GNSS, need infrastructure and are not guaranteed to be available under some key environmental / terrain conditions. Absolute heading sensors are affected by local magnetic anomalies. In some cases, GNSS and magnetometer performance deteriorates in similar environments thus leading to a ‘black out’ in absolute localization aiding. Examples of such environments include indoors, urban canyons, and under foliage mounted on a ferro-magnetic structure. The above issue with localization aiding has prompted many researchers to develop alternative localization sensors and methods. Examples of such sensor systems are stereo and monocular vision systems, Light Detection and Ranging sensors (LIDARS), and RADARS. Such sensors have been used primarily as incremental navigation systems. There are also an increasing number of prototype systems that detect Signals of Opportunity such as Wi-Fi and Radio that use triangulation / trilateration to localize a platform. Such systems have been used as absolute navigation systems. The fundamental problem with any of the above systems is that they are not robust to operating conditions. The localization solution can fail more often than a GNSS signal, for example. A useful property of the above systems is that most of them have complementary error properties. Vision systems are precise in determining attitude information (~ 1 milli-deg or less) but suffer from ranging features in the environment. LIDAR systems are only as good as the encoder (scanning) or resolution (flash) for attitude (~ 10

5411-4244-1537-3/08/$25.00 ©2008 IEEE

milli-rad) but very good at ranging. Systems based on Signals Of Opportunity help provide absolute aiding to any drifts accrued by incremental navigation systems. The complementary error properties of these GNSS denied localization solutions are the motivation for the work presented in this paper. We present technology development at the individual sensor level that have the potential to be stand alone GNSS denied navigation solutions. These navigation solutions can be integrated to provide a robust GNSS denied navigation solution. This paper is organized as follows. In Section 2, we describe the fundamental methods used to extract features from an intensity or range image. We discuss the benefits of extracting features of various types, points, line, and planes. In Section 3, we briefly describe the methods that we currently use to match features. In Section 4, we describe the methods that we have successfully implemented for motion estimation using the range and intensity images. We describe the merits and demerits of these methods. In Section 5, we present implementation results of the algorithms mentioned in the previous sections. In Section 6, we briefly mention our ongoing efforts at integration of the various vision based sensors with a conventional navigation solution. We end the paper with conclusions and some closing remarks in Section 7.

2. FEATURE EXTRACTION In this section we discuss algorithms we have developed to extract different types of features from different types of sensors. Particularly we discuss point, line and plane feature extraction. 2a. Points - SIFT, SURF, Harris This section describes point feature extraction algorithms that we developed for intensity images. The methods considered in this paper are SIFT (Scale Invariant Feature Transform [1]), SURF (Speeded-Up Robust Features [2]), and a modified Harris, [3, 4]. The ultimate requirement was to be able to navigate in environments with sufficient contrast without GNSS measurements. SIFT and SURF are feature extraction algorithms that use specific spatial filters (Difference of Gaussians and Laplacians respectively) at multiple scales of an image to extract features, [1], [2]. The output of SIFT and SURF feature extraction algorithms is a list of feature locations and normalized descriptor vectors of length 128 (SIFT), or 64 or 128 (SURF). All SURF results were obtained using the length 64 descriptor vector. The standard Harris feature extraction algorithm uses gradient filters to extract features, [3. 4]. The output is a list of feature pixel locations. A descriptor vector was created using pixel intensities in a window around a feature and normalizing the resulting vector to unity. The length of this descriptor vector is 121 for a selected window size 11×11 pixels. Fig. 1 shows a typical

intensity image of a parking lot scene used for the stereo odometry experiments with the Harris feature locations overlaid.

Figure 1 Intensity image with extracted Harris feature points We applied the above three intensity image feature extraction algorithms to range images. In doing so, the detection of point features becomes complicated. It can be clearly seen in Fig. 1 that most Harris feature points are detected in texture and those feature points would not be visible in the range images. It is also important to note that edges and corners in range image represent points where the range changes abruptly. That means that 3D coordinate measures at those points are numerically ill-conditioned hence very sensitive to errors in matched feature locations between multiple images. In order to be able to resolve the issue it is important select features in range images that have a distinct ‘signature’. In other words, given a range image R(i, j) we are interested in points where the gradient of the range

),(,),( jij

R

i

RjiR ⎥⎦

⎤⎢⎣

⎡∂

∂

∂

∂=∇ changes its direction, but its

magnitude remains relatively small. If we express the gradient vector in polar coordinates ),](,[),( jijiR θρ=∇ , such points can be detected as feature points in phase, θ or amplitude, ρ. 2b. Lines - Edge detection This section describes an algorithm that we have developed for edge detection using a monocular camera. The requirement was to guide a vehicle autonomously along the boundary of a region. We experimented with standard edge detection algorithms such as Canny edge detection, [16]. The performance was poor due to the particular operating environment and false selection of edges. The algorithm we subsequently developed benefited from combining signal and image processing methods with vehicle dynamics. In particular, the algorithm consisted of the following steps. 1. Image segmentation – Region of interest selection

based on extrinsic calibration of camera 2. Directional pre-filtering – Filter image to accentuate

pixels parallel to the edge 3. Edge detection – Standard edge detection using

Canny algorithms, for example.

542

4. Shadow edge removal – Identify edges selected in step 3 associated with a shadow using Histogram based threshold during runtime.

5. Dynamic prediction window – Dynamically determine sub-region of interest based on current edge estimate and vehicle motion.

6. Measurement fusion – The above steps result in an estimate of the edge for every in the image. A Kalman filter is used to fuse all the above estimates to determine the location of the edge at the point of interest in the image.

The algorithm was tested at 17 Hz on a general purpose linux computer in a relevant environment. Figure 2 is one frame from a run time video of the algorithm.

Figure 2 Line feature extraction (red line)

2c. Planes This section describes a plane extraction algorithm that we have specifically developed for flash LIDARs which provide both intensity and range images. The requirement was to be able to navigate in environments where at least two non coplanar planes are detected in every frame. The fundamental principle of the algorithm is to compare the normal vectors to a virtual surface fit at the neighborhood of every pixel in a 3D point cloud. A local, pixel curvature is determined by calculating the arithmetic mean of angles between the center pixel and the pixels in the neighborhood. The neighborhood of a pixel is a plane if this pixel curvature is less than a specified threshold. To ensure only pixels with valid range information are evaluated, additional thresholding is then performed, this time comparing the intensity of reflected light. The orientation of the plane is determined by the projection of the calculated normal to the three principal axes. Illustrations that show the output of the algorithm are shown in Fig. 3 and Fig. 4.

Figure 3 Parking lot scene. Top row: (a) intensity image, (b) range image from the SR3000, (c) extracted planes scale 1. Bottom row: (d) extracted planes scale 2, (e) extracted planes scale 4, (f) extracted planes scale 8

Figure 4 Desk scene. Top row: (a) intensity image, (b) range image from the SR3000, (c) extracted planes scale 1. Bottom row: (d) extracted planes scale 2, (e) extracted planes scale 4, (f) extracted planes scale 8 In this section we described points, line and plane feature extraction methods. For point feature extraction we used standard algorithms such as SIFT, SURF and Harris. We modified Harris to aid us in matching features more robustly as will be described in the next section. The primary issue we faces in extracting SIFT and SURF features is the computation complexity of the algorithms leading large feature extraction time budgets in a realtime navigation solution. We also described our method of extracting edges. The edge extraction method cannot be generalized to extracting lines in a 3D image. We finally described a geometric method to extract planes. The algorithm currently is not optimized for plane extraction from one frame to the next hence leading to larger than required time budgets. We are currently working towards a realtime implementation of the same. Once features are extracted, in successive frames, the next step in navigation is to match them appropriately. A correct set of matches is one that preserves a unique motion estimate between a set of features at time t1 and a set of features at time t2.

3. FEATURE MATCHING Matching point features between two images is performed using the descriptor associated with each feature in one image and looking for the best match from the feature descriptors in the second image. For SIFT, SURF, and Harris features, the descriptor is a normalized vector containing information about the image region around the feature (Section 2a). The matching algorithm first calculates the Euclidian distance between the descriptors for each feature in the first image (I1) with each feature in the second image (I2). A feature in image I1 is matched to the feature in I2 with the minimum distance between the descriptor vectors. In addition, the ratio between the minimum distance and the second-smallest distance must be below a threshold to reduce matches between points with very similar descriptors.

543

Finally, in order to reduce the number of false matches, for left/right stereo pairs, two additional quality criteria are applied: the epipolar and the disparity constraints. To impose the epipolar constraint, we require the difference of the row indices of the two features to be within a certain threshold. To impose the disparity constraint, we require that the difference of the right and left column index for the matched features to be negative. We are currently working on methods to match line and plane features. Most methods in literature on line and plane matching use an alternate sensor aid to assist with the matching. Adequate caution needs to be exercised when matching using other sensor aids so as to not corrupt the matching with aiding sensor noise. While this seems benign, the real complication arises when using such measurements in a Kalman filter to fuse. Suppose a sensor A was used to match features from an image measured from sensor B. The motion estimated from B will now be correlated with the sensor A and hence a Kalman filter needs to be modified to fuse two sensors with correlated noise properties. In this section we presented methods for matching point features. The next step in the process of navigation would be to estimate motion between a set of point features matched at time t1 to a corresponding set of point features matched at time t2. In the next section we start our discussion of various methods that we developed and tested to estimate motion using point features, only.

4. MOTION ESTIMATION In this section we present three algorithms that we developed for estimation of a moving sensor: 1. Absolute orientation – a classic algorithm to estimate

motion of two sets of 3D point clouds, [5,6] 2. Hybrid eight-point algorithm – a modification of the

classic eight-point algorithm which is used to estimate attitude change between two image senor poses, [7,8]

3. Simultaneous Localization And Mapping (SLAM) – an algorithm that localizes in a global frame using a map built by fusing successive measurements, [9]

All three algorithms assume that appropriate intensity / range image point features have already been matched. Absolute orientation and the Hybrid eight-point algorithm estimate motion between successive frames only. These algorithms do not incorporate either memory of position estimates for features that are currently not observed or a dynamic motion model. On the other hand, SLAM is an algorithm that can track the same features over multiple frames even if they are not seen in some frames. This helps navigate a sensor with better accuracy and improve motion estimates with time, [9]. Another limitation of Absolute orientation and Hybrid eight-point algorithm is that motion is estimated using a least squares minimization. This makes the estimated motion very

sensitive to outliers, sets of point features that do not preserve a unique motion. We use the RANdom SAmple Consensus (RANSAC) algorithm in both cases to perform a first outlier rejection step, [10]. A second outlier rejection step is then performed in a refinement algorithm. We will now explain the three motion estimate algorithms in more detail. 4a. Absolute orientation Absolute orientation is a closed form solution to the 2-norm minimization problem of relative motion between two sets of matched 3D point clouds, [6]. Consider two sets of measurements Pi and Qi , for i = 1, …, M, of a set of M 3D landmarks, where M >= 3. The two sets of measurements may be taken by a sensor in two different poses. The motion between poses can be expressed with the pair (R, T), where R is the direction cosine matrix denoting the rotation and T is the translation. The objective is to estimate the motion (R, T) by minimizing the error:

Σi ||ei || = Σi ||Qi – (RPi + T)|| where Pi, Qi, R, and T are defined in the appropriate coordinate frames. The solution of this minimization problem is known in closed form, [6]. Outliers can greatly affect the solution of Absolute orientation because the method is a least squares minimization problem. In our experiments we observed up to 70 % outliers in many cases. Hence the algorithm, if implemented without any outlier rejection, leads to very poor performance, Fig. 5. We use RANSAC in conjunction with Absolute Orientation to reject outliers and improve the performance of motion estimation. RANSAC is setup to randomly select three (the minimal subset) points from the matched 3D points Pi and Qi to generate the motion hypotheses. Every hypothesis is then scored, the hypothesis with the highest score is considered a crude estimate of the true motion and should be further refined. Scoring is the process of determining if a candidate set of points meet the error metric with in a certain tolerance. Note that only three points were used to compute R and T and, therefore, a lot more information is still hidden in the 3D point clouds. Different refinement algorithms have been used in the literature with an objective to determine R and T that minimize a certain cost function. Typically the cost function is non linear and non convex in R and T, hence an iterative algorithm is used to compute the optimal estimate of the motion. The best hypothesis obtained with RANSAC is used as the initial condition of such algorithms, [7, 8, 10]. The performance of Absolute orientation with RANSAC and Refinement is shown in Fig. 15 and discussed later.

544

Figure 5 Dead-reckoning of LIDAR motion estimation obtained with Absolute Orientation (blue dots) with no outliers rejection. The black solid line is ground truth. The solid red line represents the portion of the trajectory with dead reckoning 4b. Hybrid eight-point algorithm Absolute orientation allows the reconstruction of motion from two sets of matched 3D point clouds. It is easy to see that the level of accuracy of the recovered motion will depend on the level of accuracy of the 3D point clouds. It is possible to recover the motion (up to a scale) between two poses independent of the 3D information, provided an image of the two scenes is available. Traditionally the eight-point algorithm is amongst those used when only a monocular camera is available and a 3D point cloud cannot be constructed. Since motion is estimated using only matched image features (as opposed to matched 3D features), a greater level of accuracy can be achieved especially in attitude determination (as we will show in Section 5a), [8]. The inputs to the eight-point algorithm are the coordinates of two sets of matched features (at least eight) between two frames. From the two sets of points the essential matrix can be estimated, and finally the attitude direction cosine matrix R and direction of translation T between two frames can be computed, [5, 7]. The accuracy of R and T is a function of the accuracy of the feature locations. The limitation of the eight-point algorithm is that the scene scale, and therefore the translation, cannot be recovered unless more information about the scene is available. When the LIDAR or a stereo camera is used, the Hybrid eight-point algorithm can be combined with other algorithms to estimate R more accurately, the direction of motion T, and the scale of the scene. As said before, we execute the Hybrid eight-point algorithm in a RANSAC framework to perform outlier rejection. To determine how many hypotheses should be generated the following formula can be used;

L = log (pin) / log (1 – poutN),

where pin is the estimated percentage of inliers, pout is the desired probability of exiting RANSAC with a correct solution, L is the number of necessary trials, and N is the minimum number of points necessary to run the pose estimate algorithm, [10]. For Absolute Orientation N = 3,

while for the Hybrid eight-point algorithm N = 8. It is easy to see that L can be much larger for the case of Hybrid eight-point algorithm as compared to Absolute orientation; therefore it can become computationally very intensive to generate enough hypotheses. 4c. SLAM SLAM is an algorithm that uses relative range measurements from the platform to matched features to enable the platform to perform two tasks [11], [12]. First, the SLAM algorithm builds a map of its operating environment. Second, the SLAM algorithm estimates the platforms navigation solution within this operating environment. The navigation solution can be defined by specifying the relative position, velocity, and orientation of two reference frames. The two reference frames used in this application are a platform fixed body frame and a North-East-Down (NED) frame with fixed origin. We will refer to this NED frame as the navigation frame. In general, the SLAM algorithm requires a) a set of sensors to measure the relative range to features; b) a stochastic system consisting of a dynamic model, measurement model, and sensor measurement models; and c) a filter that blends the observations of the feature positions relative to the platform. The state vector corresponding to the dynamic model includes both the platform’s kinematic state (the navigation solution) and position of all observed features. The measurement model relates observations of the feature positions in view of a sensor at a specific measurement period to the navigation solution. We have selected an extended Kalman filter (EKF) to implement the SLAM algorithm. The EKF, under various assumptions on the stochastic system, can be viewed as a model based algorithm that is used to recursively estimate the statistics of the navigation solution and the landmark states [13]. A block diagram illustrating the SLAM algorithm is shown in Fig. 6.

PREDICTION

Q, {XV, XLmk, PV, PLmk}k-1|k-1

MATCH {XLmk}k|k-1

to{XLmk}k

{XLmk}k

New measurement

RANSAC (reject outlier matches)

idxin.m

idxout.m

{XV, XLmk, PV, PLmk}k|k-1

CORRECTION

{XV, XLmk(idxout) PV, PLmk(idxout)}k|k-1

{XV, XLmk, PV, PLmk}k|k

Matched idxUnmatched idx

INITIALIZE LANDMARKS

idx.um

{XV, XLmk.Aug, PV, PLmk.Aug}k|k

MAP

Figure 6 SLAM Architecture

545

The EKF provides several advantages in estimating the navigation solution as compared to the Absolute orientation and Hybrid eight-pt algorithms. First, the EKF computes the variance of the estimated navigation solution and the feature positions in the navigation frame. Second, the measurement errors of the relative feature observations can be incorporated into the EKF. Therefore, observations to features with smaller measurement errors can be given a larger weight as compared to observations to features with higher measurement errors. Third, the EKF provides an estimate of the navigation solution based on all current and previous feature observations, positions of all observed features in the navigation frame and not simply the currently observed features, and the estimation errors of both the n navigation solution and feature positions. One of the fundamental disadvantages of SLAM compared to Absolute orientation or the Hybrid eight-pt algorithm is that the algorithm complexity grows with the number of features seen over time, in other words, the size of the map. Any hardware than can implement SLAM at an acceptable data rate can implement Absolute orientation or the Hybrid eight-pt at a much faster data rate. In this section we discussed the methods for motion estimation using point features. We presented methods that both use and do not use memory (of past motion and environment) in order to determine a navigation solution. There are benefits to both classes of methods. Navigation methods with memory (states) are more difficult to stabilize, less robust to measurement errors but induce less drift. Navigation methods without memory are more robust and induce more drift. An ideal implementation would be to combine the two ideologies which has led many researchers to consider Visual Odometry with Bundle Adjustment.

5. IMPLEMENTATION

In this section we present implementation results of the algorithms discussed in Section 4. We begin with a discussion of sensors and algorithms and identify the importance of pairing sensors with appropriate algorithms based on sensor error properties. This is followed by experimental results on the performance of some of the sensors that we have used. We then present details on the hardware platform on which we have tested the various algorithms. Finally, we present performance of the motion estimation algorithms that were discussed in this paper. 5a. Sensors v/s algorithms We have tested the previously mentioned algorithms on monocular, stereo, and LIDAR sensors. While all the algorithms are sensor agnostic, they suffer from being sensor agnostic. The algorithms do not exploit the error properties of a particular sensor in order to determine a solution.

Specifically, the eight-point algorithm, typically used with monocular cameras, exploits the location of features in the image frame. As a result, the error in the motion estimate is due to 1) matching errors between frames and 2) pixel resolution. Fig. 7 plots the performance of the eight-point algorithm as a function of pixel size and focal length.

0 2 4 6 8 10 12 14 160

0.1

0.2

Focal length = 4 mm

w (deg)

Field of Vie

2 4 6 8 10 12 14 160.02

0.04

0.06

0.08

0.1

0.12

pixel size = 5 microns

att err

FOV

Figure 7 Performance of the eight-point algorithm

For any focal length, f, the resolution of the sensor is given by the ratio of the pixel size and the focal length. For example, for a focal length of f = 10mm and pixel size = 5 microns, the sensor resolution is approximately 0.03 deg. Hence, the error induced by the eight-point algorithm is of the order of the angular resolution of the sensor. The Absolute orientation algorithm is based on triangulation of feature points or location of feature points in the 3D frame. It can be paired with a stereo rig or a 3D LIDAR like sensor. It is important to underline that the performance of a stereo motion estimation system depends not only on 1) pixel size and 2) lens focal length, but also on 3) the stereo baseline (displacement between the two monocular cameras of a stereo system).

0 5 10 150

1

2

3

Focal length = 4 mm

1.25 m

1.0 m

0.75 m

size of a pixel (microns)Attitude erro

r / frame (deg)

2 4 6 8 10 12 14 160

0.5

1

1.5

1.25 m

1.0 m

0.75 m

focal length (mm)

pixel size = 5 microns

Attitude erro

Figure 8 Performance of the stereo based Absolute Orientation algorithm

546

The top plot of Fig. 8 illustrates the attitude error dependence on pixel size for a 4 mm lens focal length for three different stereo baselines (indicated on the plot). Notice that in all three cases, the stereo system performs much worse than a monocular system (compare results in Fig. 7). This is because triangulation is used to estimate the 3D feature vectors which are then used to estimate motion. Triangulation induces range error of features, which is a quadratic function of the range of the feature. Moreover, this range error affects the bearing and azimuth of the feature vector by virtue of the pin-hole model of the sensor. These additional sources of error (in addition to the feature match error) are the reason why attitude estimation using a stereo system is more erroneous that using a monocular system. The bottom plot of Fig. 8 illustrates attitude error dependence on lens focal length. Laser based sensors are not as susceptible to ambient lighting conditions. From the perspective of features, lasers are known to provide environmental range information at much better accuracy than stereo systems. In this section, we provide performance results of 3D laser systems as a comparison to stereo systems. 2D laser systems do not provide enough information every measurement cycle to be useful for precision navigation in unrestricted space. If one were to make the assumption of motion in a plane, then 2D laser systems may be used for navigation. However, laser systems consume a lot more power, are more expensive, and weigh significantly more than vision systems.

0 0.05 0.1 0.15 0.20

1

2

3

4

5

Range accuracy = 0.05 % of Range

HDG

RLL/PCH

0 0.05 0.1 0.15 0.2

0.8

1

1.2

1.4

1.6

Attitude error

Range accuracy of laser beam (% range)

Angular accuracy = 0.05 deg

Figure 9 Performance of a LIDAR triangulation based attitude estimation algorithm Unlike stereo vision systems where the range resolution of features degrades quadratically with range, laser systems provide significantly better range accuracy over the domain of operation. The range accuracy of laser systems is usually affected by spot size of the laser beam which is due to beam dispersion. There are laser systems that can provide cm level accuracy at ranges of 100 m or more and angular accuracy of 0.1 deg.

The top plot of Fig. 9 illustrates the attitude performance that can be achieved using a laser system to estimate motion. The fact that we observe such a growth in attitude error computation in laser systems (top plot of Fig. 9) indicates that the triangulation method (also used for stereo processing) to compute motion using a LIDAR may not be the best choice. Notice also, that there is a significant difference in degradation of roll/pitch and heading. This is because the laser bearing accuracy affects heading additively whereas the laser azimuth accuracy affects roll and pitch via the direction cosines of the feature vector. In the top plot, it is assumed that the range accuracy of the laser is 0.05% of the range to features. In summary, we believe that it is important to pair a sensor appropriately with a motion estimation algorithm. Ideally, if sensors did not have heterogeneous error properties and neither did algorithm, then we could pair algorithms with sensors arbitrarily. For example, a sensor that is poor in ranging but accurate in elevation and azimuth should be paired with a triangulation algorithm for positioning (position based on angles to feature points). 5b. Sensor performance In all our experiments, we validate the model approximated by a sensor. The monocular, stereo and LIDAR sensors that we used obeyed the pin-hole model after rectification of the images obtained by the sensor. Rectification is the process of calibrating the nonlinearities of a lens and correcting for the extrinsic parameters of multiple cameras. However, our calibration experiments of the Swissranger Flash LIDAR [14] yielded some surprising results. We performed two experiments that were set up as follows: 1. the LIDAR was fixed in a location and multiple

snapshots were taken with a changing scene and 2. the LIDAR was moved along a square of side 20 cm

without change in attitude and a snapshot was taken at each vertex of the square path.

In both cases we extracted and matched features Pi and Qi across two images. Once we verified that the 3D point clouds were properly matched we evaluated the errors:

Ei = (Pi –Qi)-T for every point in the 3D point clouds. The error Ei represents the difference between the ground truth translation T and the measured translation (Pi –Qi). The three histograms in Fig. 10 show the error distribution for the first experiment set up in the x, y, and z directions. The error along the z direction (forward range) is quite large and definitely larger than the errors along the x (left) and y (up) axes. This is surprising given the specifications of the sensor, [14]. During testing we realized that the performance of the LIDAR can degrade enormously when the objects that are being observed are black. We observed that the measurements of black objects are very noisy and the error, Ei, can be as large as 2m within the operating range of the sensor. These measurements are

547

not represented in Figs. 10 and 11, because black objects were not detected as feature points in those experiments. The histograms in Fig. 11 show the error distribution for the second experiment set up in the x, y, and z directions. Once again, the error along the z direction is larger than the errors along the x and y axes. The distributions of the measurement errors of the second experimental set up are slightly wider than those of the first experimental set up. The degradation in performances could be due to the fact that in the second experiment the LIDAR was moved and therefore the angle of reflection to the objects changed.

Figure 10 Histograms of the LIDAR measurement error in the image plane (x and y axes) and in the forward direction (z axis) for the first experiment,

Figure 11 Histograms of the LIDAR measurement error in the image plane (x and y axes) and in the forward direction (z axis) for the second experiment. 5c. Platform The test platform used for data collection (in some cases) and real time testing (in some other cases) consists of a calibrated stereo pair of color 640x480 cameras from The Imaging Source (model DFK 21BF04), an integrated Honeywell HG1700 IMU with 2cm RTK GPS system and a signal generator, all mounted on a push cart. A laptop computer is used for data collection, synchronization, and monitoring. Left and right camera images are synchronized using the external trigger capability of the cameras along with the signal generator. The camera images are synchronized to the navigation information using timestamps collected on the images and navigation updates.

We implemented Absolute orientation using SURF features in real-time, the results are shown in Fig. 13. The computer software we used is a Linux executable that acquires and stores images, extracts and matches features between the left and right images, determines the 3D location of features between two successive frames, and determines the change in position and orientation based on the changes in the position of these features. All input data is stored and can be played back by the same software executable. 5d. Results In this section we discuss results of motion estimation using various combinations of sensors and algorithms presented in previous sections in this paper. All experiments were conducted in a parking lot with static and moving objects. In all the following plots, any references to ground truth or ‘Nav’ are equivalent. 5d1. Camera navigation – SIFT/SURF v/s Harris, SLAM Fig. 12 shows a comparison between SIFT, SURF, and Harris features for several variables. The number of features is a raw count of the number of features extracted from the left image. The number of matches is the raw number of matches between the left and right image, while the number of good matches is the total number of matches that pass the epipolar and disparity constraints described in Section 3a. Finally, the number of RANSAC matches is the total number of inliers described in Section 4a that are used to estimate the motion. The excess of SIFT features come from feature points found on the pavement, but these features do not pass the matching criteria, giving roughly the same number of matches between SIFT and SURF.

Figure 12 Plot of the raw number of features in the left image, matched features between left and right image, left/right matches that pass epipolar and disparity criteria, and the number of features considered as inliers from the RANSAC navigation solution Fig. 13 shows the ground truth as well as the dead reckoning of the camera motion estimate. We compare the performance of Absolute orientation, as described in Section 4a, when using three feature extractors (SIFT, SURF, and Harris features). The starting point of the path

548

is the upper right, with a 180 degree right turn taken at the bottom left of the plot. In general, Absolute orientation performs well when paired with all three feature extractors. The worst performance occurs during the turn while using the Harris features. This corresponds roughly to frame 150 in Fig. 12, where there are a small number of matched features that pass the RANSAC selection. The drift after the turn is caused by an error in the heading estimation while turning.

-60 -50 -40 -30 -20 -10 0 10-40

-35

-30

-25

-20

-15

-10

-5

0

5

East (m)

Nor

th (

m)

Nav

SIFTSURF

Harris

Figure 13 Plot of dead reckoning of the camera estimated motion using SIFT, SURF, and Harris features, compared to the Nav solution Fig. 14 shows the performance of the SLAM architecture described in Fig. 6. The ground truth is shown in black and is the same as ‘Nav’ in Fig. 13. The trajectory computed using the SLAM EKF is shown in red. The estimated trajectory is shown for two sets of time periods. For both sets of time periods, the EKF is initialized using the ground truth. The time and measurement update rates for the EKF were at the same frequency. The features for the RANSAC algorithm were extracted using SURF. The estimated trajectory for both sets of frames converged such that the error of the estimated cart position was less than 0.2m within 10 measurement updates. The number of matched features from the RANSAC algorithm in the set of frames between 108 and 172 (in between the two segments) dropped significantly. This loss of matched features within this third set of frames caused a bias in the estimated cart navigation solution.

Figure 14 SLAM EKF estimated trajectory (red) and ground truth (black)

5d2. LIDAR nav - AbsOR v/s Hybrid 8pt In this section we present results of navigation using the Swissranger LIDAR.

Figure 15 LIDAR motion estimation using Harris features with Absolute Orientation (blue dots) and Hybrid eight-point algorithm (blue crosses). The black solid line is ground truth. The red solid line represents where dead reckoning was performed Fig. 15 shows the offline implementation of Absolute Orientation (blue dots) and Hybrid eight-point (blue crosses) using Harris features. Fig. 16 shows the offline implementation results of Absolute Orientation and the Hybrid eight-point algorithm using SIFT features.

Figure 16 LIDAR motion estimation using SIFT features with Absolute Orientation (blue dots) and Hybrid eight-point algorithm (blue crosses). The black solid line is ground truth. The red solid line represents where dead reckoning was performed In this test, the Hybrid eight-point algorithm outperforms Absolute Orientation. This is because the 3D point clouds generated by the LIDAR are particularly noisy. The Flash LIDAR performs quite poorly in outdoor tests as compared to indoor tests. When the test shown in Fig. 15 and Fig. 16 was performed, the RTK GPS signal was poor (varying number of satellites) and quite large resets led us to consider only certain time periods of the whole trajectory (solid red).

6. INTEGRATION

In this section we present past work on integration of a vision system and an IMU. We developed two methods of integration, a deterministic complementary filter framework and an EKF framework. The block diagram of a complementary filter is shown in Fig. 17. The integration method is very similar to a standard AHRS architecture.

549

Figure 17 Vision aided AHRS (Attitude and Heading Reference System)

0 5 10 15 20 25 30 35 40 45 50-30

-25

-20

-15

-10

-5

0

East (m)

Nor

th (m

)

AHRSVIS

AHRSGPS

GPS

Figure 18 Vision aided AHRS integration results

Unfortunately, the AHRS algorithm requires time synchronized measurements as can be seen from the block diagram, Fig. 17. Vision processing is computationally intensive and measurements are not always guaranteed to be available instantaneously. The experimental results of vision integrated AHRS is shown in Fig. 18. The GPS sensor used for this experiment had a position accuracy of 1.5 m CEP. We integrated the vision system in a modified EKF framework. The integration method included an Out Of Sequence modification in order to account for the sensor measurements that are valid for a previous epoch. One of the methods of handling such measurements is to have as many delay states in the Kalman filter as the ratio of the sensor time delay and the filter sample time period. Such a method can lead to increase in computational cost of the Integrated Navigation System. Fig. 19 shows the performance of an Out Of Sequence delayed Vision integrated Kalman filter. It is assumed that Vision measurements are delayed by one filter sample time. The GPS sensor used for these experiments had a position accuracy of 1.5m CEP.

-5 0 5 10 15 20 25 30 35 40 45-20

-15

-10

-5

0

5

10

East (m)

Nor

th (m

)

EKFIMU+VIS

GPS

Figure 19 Vision aided EKF integration results

7. CONCLUSIONS

In this paper we presented technical results in our pursuit of GNSS denied navigation. The paper presented development of incremental navigation methods and integration in an EKF, AHRS framework for vision based sensors [LIDAR and Optical]. The navigation method comprised of feature extraction, matching, and motion estimation methods. Motion estimation methods using on board sensors typically offer an incremental localization solution which drifts over time. A true replacement to GNSS will need to be an absolute localization solution. Such a solution can guarantee driftless position and attitude information. Current methods for driftless localization involve a priori maps and correlation with a map. Map based solutions would need to adapt to a non-homogeneously mapped Earth.

REFERENCES [1] D.G. Lowe. “Distinctive image features from scale-

invariant keypoints”. International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.

[2] H. Bay, T. Tuytelaars, Luc Van Gool. “SURF:

Speeded Up Robust Features”. Proceedings of the ninth European Conference on Computer Vision, May 2006.

[3] C. Harris, M. Stephens. “A Combined Corner and

Edge Detector”. Proceedings of The Fourth Alvey Vision Conference, Manchester, UK, August-September 1988, pp. 147-151.

[4] P. D. Kovesi. “MATLAB and Octave Functions

for Computer Vision and Image Processing”. School of Computer Science & Software Engineering, The University of Western Australia, available from http://www.csse.uwa.edu.au/~pk/research/matlabfns/.

[5] R. Hartley. “In defense of the eight-point

algorithm”. IEEE Transactions on Pattern Analysis and Machine

ACC

GYRO

Cbn 1/s 1/s +

g

Cbn +

1/s

PID

PID V_ref

V_ref = V_vis or V_gps or their convex combination

Correction acc

Correction gyro

550

Intelligence, Vol. 19, Issue 6, June 1997, pp. 580 – 593.

[6] B. K. P. Horn. “Closed-form solution of absolute

orientation using unit quaternions”. Journal of Optical Society of America A, Vol. 4, Issue 4, 1987, pp. 629 –???.

[7] R. Hartley, A. Zisserman. “Multiple View

Geometry in Computer Vision”. New York, NY: Cambridge University Press, 2003.

[8] D. Nister. “An Efficient Solution to the Five-Point

Relative Pose Problem”. IEEE Transactions On Pattern Analysis and Machine Intelligence, Vol. 26, Issue 6, June 2004, pp. 756-770.

[9] J. Andrade-Cetto, A. Sanfeliu. “Environment

Learning for Indoor Mobile Robots: A Stochastic State Estimation Approach to Simultaneous Localization and Map Building”. Springer Tracts in Advanced Robotics, 2006.

[10] M. A. Fischler, R. C. Bolles. “Random Sample

Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”. Comm. of the ACM 24: 381–395, June 1981.

[11] H. Durrant-Whyte, T. Bailey. “Simultaneous Localisation and Mapping (SLAM): Part I The Essential Algorithms”. IEEE Robotics and Automation Magazine, June 2006, pp. 99-108.

[12] M.W.M Gamini Dissanayake et al., “A Solution to

the Simultaneous Localization and Map Building (SLAM) Problem”. IEEE Transactions on Robotics and Automation, Vol. 17, No. 3, June 2001, pp. 229-241.

[13] J.L. Crassidis, J.L. Junkins. “Optimal Estimation of

Dynamic Systems”. Boca Raton, Florida: Chapman and Hall, 2004.

[14] Mesa Imaging AG. “SwissRanger SR-3000 -

miniature 3D time-of-flight range camera”. http://www.mesa-imaging.ch/prodviews.php.

[15] R.C. Hayward, D. Gebre-Egziabher, M. Schwall, J.

D. Powell, J Wilson. “Inertially aided GPS based attitude heading reference system (AHRS) for general aviation aircraft”. Proceedings of ION GPS, September 1997, Kansas City, MO, USA, pp. 289-298.

[16] Canny, J. “A Computational Approach To Edge Detection”, IEEE Trans. Pattern Analysis and Machine Intelligence, 8:679-714, 1986.

551

Documents

[IEEE 2008 IEEE/ION Position, Location and Navigation Symposium - Monterey, CA, USA (2008.05.5-2008.05.8)] 2008 IEEE/ION Position, Location and Navigation Symposium - Sensor fusion