Robust Cooperative Visual Localization With Experimental Validation for Unmanned Aerial Vehicles

Original Article

Robust cooperative visual localizationwith experimental validation forunmanned aerial vehicles

Abdelkrim Nemra and Nabil Aouf

Abstract

This article aims to present an adaptive and robust cooperative visual localization solution based on stereo vision

systems. With the proposed solution, a group of unmanned vehicles, either aerial or ground will be able to construct

a large reliable map and localize themselves precisely in this map without any user intervention. For this cooperative

localization and mapping problem, a robust nonlinear H1 filter is adapted to ensure robust pose estimation. In addition,a robust approach for feature extraction and matching based on an adaptive scale invariant feature transform stereo

constrained algorithm is implemented to build a large consistent map. Finally, a validation of the solution proposed is

presented and discussed using simulation and experimental data.

Keywords

Visual SLAM, data fusion, feature extraction, cooperation

Date received: 2 May 2012; accepted: 25 September 2012

Introduction

Self-localization of unmanned vehicles (UV) is stillconsidered as a fundamental problem in autonomousvehicle navigation. The problem has been tackled inthe recent past for dierent platforms and a numberof ecient techniques were proposed as presented inCox and Wilfong,1 Toledo-Moreo et al.,2 Leonardand Durrant-Whyte,3 Toth4 and references there in.More recently, driven by the interest of military appli-cations and planetary explorations, researchers haveturned their attention towards localization of vehiclesmoving in hostile and unknown environments.5,6 Inthese cases, a map of the environment is not availableand hence a more complex simultaneous localizationand mapping (SLAM) problem must be faced andsolved. The vehicle used in these above-mentionedapplications, ranging from aerial to ground vehicles,has to collect information from the environmentthrough its sensors and at the same time it must local-ize itself with respect to the map it is building.Extensive research works, employing extendedKalman lter (EKF) and particle lters,7,8 have beenreported in the literature to address several aspects ofvehicle SLAM problem.9 However most of theseresearch works concern the case of a single vehicleexploring an environment through landmarkdetection.

In many applications as related to defence indus-try, a single sensing platform may not be sucient to

precisely collect data or to create maps of anunknown or partially known environment. Currentlyand more in the future, there are a number of appli-cations under development in which distributed sen-sing systems are deployed to gather information aboutremotely monitoring environments. Fleets of autono-mous underwater vehicles (AUVs),10 unmanned aerialvehicles (UAVs)11 and unmanned ground vehicles(UGV)12 have been lately proposed for applicationsranging from environmental monitoring through sur-veillance to defence. These systems require the abilityto share information across a wide variety of plat-forms and to fuse information from dierent sourcesinto a consistent scene view.13 Deploying multiplevehicles into an environment and providing themwith a mechanism for sharing information can pro-vide higher data rates, increases robustness, and min-imizes systems failure.

Autonomous navigation of multiple vehicles hasintroduced the problem of cooperative simultaneouslocalization and mapping (C-SLAM) that has veryrecently attracted the interest of many researchers.

Proc IMechE Part G:

J Aerospace Engineering

0(0) 119

! IMechE 2012Reprints and permissions:

sagepub.co.uk/journalsPermissions.nav

DOI: 10.1177/0954410012466006

uk.sagepub.com/jaero

C-SLAM is performed when multiple vehicles sharenavigation and perception sensing information inorder to improve their own position estimatesbeyond what is possible with a single vehicle. Simplecollective navigation has been demonstrated in simu-lation using multiple cartographer vehicles that ran-domly explore the environment.14 Sty15 performssimple relative localization between collaboratorsusing directional beacons. Other research work pre-sented,16,17 challenges in terms of localization per-formances for C-SLAM and its growinguncertainties. Simulation and experimental validationwere conducted in this later work to support the pre-sented analysis. In addition, techniques based onentropy minimization,18 and information theory,19

were also developed in the past for C-SLAM problem.To date, typical SLAM approaches have been

using laser range sensors to build maps in two andthree dimensions.2022 Recently, the interest of usingcameras as main sensors in SLAM has increased andsome authors have been concentrating on building 3Dmaps using visual information obtained from camerasand more specically stereo cameras.23,24 Theseapproaches are usually denoted as visual SLAM.The reasons for this interest stem from:

i. Stereo vision systems are typically less expensivethan laser range systems.

ii. Vision systems provide a great quantity of infor-mation and allow us to integrate in the robotother techniques, such as face or objectrecognition.

iii. Most laser based systems provide 2D informationfrom the environment whereas stereo vision sys-tems are able to provide a more complete 3Drepresentation of the space.

Up to date a little eort only has been done in theeld of multi-vehicle visual SLAM, named ascooperative visual SLAM (C-VSLAM), which con-siders the case where several vehicles move withintheir environment and build their map coopera-tively.25,26 This challenging visual absolute localiza-tion/mapping problem is dierent from researchcarried on recently to achieve relative navigation inmulti-vehicle systems.27 In this article, we concentrateon this C-VSLAM problem and propose a solutionthat allows us to build a visual map using a set ofreliable and a priori unknown visual observationsobtained by a team of unmanned vehicles.

Some solutions for C-SLAM,28 and more specic-ally C-VSLAM,29,30 are based on EKF. However, thislatter is very sensitive to outliers and the lower boundfor the map accuracy, as presented in Dissanayakeet al.,31 is violated due to errors introduced duringEKF linearization process producing inconsistent esti-mates.31,32 Achieving C-SLAM,33,34 and more specif-ically C-VSLAM,26,30 based on particle lterestimation scheme has a number of drawbacks related

to the computation time that make it not very suitablefor hard real time applications as in airborne naviga-tion. Although some very recent progress in proposedlters trying to approximate the nonlinear SLAMproblem,35,36 and lately C-SLAM,37 by means ofwhat is called square root information smoothingtechnique, issues related to ecient retrieval of mar-ginal covariance and results obtained through fewdata sets show that there is still a way to go tobring in such techniques to meet real time aerospacevisual navigation requirements.

In this article, based on a robust nonlinear H1(NH1) sensor fusion algorithm, we propose tosolve the C-VSLAM problem for UVs navigating inunknown natural environment, under realistic condi-tions and with experimental validations. Eectivestereo observation model and a map managementapproaches are also proposed to ensure a reducedcomputation time as to maintain a suitable correl-ation between features, which is very important inC-VSLAM loop closing detection. The proposedmap management is suitable for practical and realtime considerations for the studied vehiclesC-VSLAM problem since that the size of the lteredsystem state vector is kept of reasonable size.

The outline of this article is as follows: Singlevisual SLAM section develops the single UAVVSLAM process and observation model based onstereo camera system. In section CooperativeVSLAM, the general architecture of the cooperativeUAV VSLAM is presented. The advantage of usingmultiple UAV VSLAM with the loop closure conceptis introduced in section Feature observation andmatching. In section Simulation results, simulationresults using realistic scenarios are presented and acomparison between a single UAV and multipleUAV VSLAM is made. Finally, in Experimentalresults section experimental results of theC-VSLAM are presented and compared with thesingle VSLAM.

Single visual SLAM

Authors introduced single VSLAM algorithm for anaerial unmanned vehicle, as shown in Ergin Ozkucurand Levent Akn,30 using EKF and NH1 lter as inNemra and Aouf.32,38 In both algorithms, map fea-ture locations, vehicles position, velocity and attitudeare estimated using relative observations between thevehicle and each feature obtained using stereo visionsystem. In this article, we adopt the NH1 estimationtechnique to solve the VSLAM problem rather thanEKF estimation technique. The former presents morerobustness and consistency compared to the EKF-based VSLAM solution.7

In this section, we briey cover the basics of theNH1-based VSLAM algorithm for a single aerialvehicle. More details can be found in Nemra andAouf.32

2 Proc IMechE Part G: J Aerospace Engineering 0(0)

State vector

The estimated state vector xk at times k contains the3D vehicle position (pn), velocity (vb) and Euler angles(n , , ) and the N 3D feature locations (mni ) inthe environment, i 1, . . . ,N. In our work the fea-tures are the keypoint detected by the scale invariantfeature transform (SIFT) algorithm and represent theminima and maxima of the dierence of Gaussimage.40 The superscript n indicates the vector is refer-enced in the navigation frame, and b indicates thevector is referenced in the body frame. The bodyframe is the same as the vehicle frame. The navigationframe is xed with respect to the surface of the Earthand is placed arbitrarily within the operating area ofthe vehicle, with x-axis pointing north, y-axis pointingeast and z-axis pointing down towards the centre ofthe Earth.

Process model

The state estimate xk at times k is predicted forward intime by integrating inertial measurement unit (IMU)readings using an inertial frame mechanisation of theinertial navigation equations as

pnkvbknk

24

35 p

nk1 Cnb vnk t

vbk1 fbk vbk1 wbk CTnb gn tbk1 Enb wb t

24

35

1where fb and wb are the body-frame referenced vehicleaccelerations and rotation rates from the IMU and gn

is the acceleration due to gravity. Cnb and Enb arethe direction cosine matrix and rotation rate

transformation matrix from body to navigationframe respectively Figure 1(a). More details aboutthe process model are given in Nemra and Aouf32

and Ronnback.39 Feature locations are assumed tobe stationary in the navigation frame and thus theprocess model for the position of the jth feature is

mnj,k mnj,k1 2

The problem with the navigational solution pro-vided by inertial navigation system (INS)(Figure 1(b)), is that it drifts with time as in mostother dead reckoning systems. The drift rate of theinertial position is typically a cubic function oftime. Small errors in gyros will be accumulatedin angle estimates, which in turn misrepresent gravi-tational acceleration as the vehicle accelerationand results in quadratic velocity and cubic positionerrors.

Nonlinear H1 filter for VSLAM problemMany researchers have studied NH1 optimal estima-tor such as, Shaked and Berman,43 Petersen andSavkin,44 Basar and Bernhard45 and Einicke andWhite.46 Our H1 ltering procedure uses a similarprocedure as in Einicke and White.46 The NH1lter attempts to estimate the states given in equation(3) while satisfying the H1 performance criterionwith respect to bounded in norm uncertainties i(the higher order linearization terms should be normbounded as ij j4i ).32

The goal of the H1 lter is to bound the energy ofthe input noise by some 2, where is the worst caseor the least favorable noise, wk and vk.

32

Figure 1. UAV with IMU and stereo cameras. (a) Different frames for the UAV system and (b) INS architecture.

UAV: unmanned aerial vehicle; IMU: inertial measurement unit.

Nemra and Aouf 3

The estimation process uses the process and obser-vation models in a nonlinear lter to estimate the statexk from observations yk.

xk f xk1, uk1 g xk1 wk1yk hxk vk

3

f is the discrete version of equation (1) (in additionto elements of the landmarks states), g is a nonlinearfunction, xk is the state at time step k, wk is someadditive process noises, yk is the observation madeat time k, vk is some additive observation noises.The objective of the ltering technique is to estimatexk using available observations yk.

The non-linear aerial vehicle model and observa-tion model can be expanded about the ltered andpredicted estimates of x^k and x^k1 as

xk fx^k1=k1, uk rfkxxk x^k=k 1xk x^k=k rfwx 2 xk x^k=k

wk 4

yk h x^k=k1, uk rhkx xk x^k=k1

3 xk x^k=k1 vk 5

where rfkx is the Jacobian of f evaluated at xk1,rfwx the Jacobian of f=wk evaluated at xk1 andrhkx is the Jacobian of h evaluated at xk1 and irepresent higher order terms of the Taylor seriesexpansion. These higher order terms are normbounded as: kik4i.

The state and observation model may be rewrittenas

xk1 Fkxk kwk k 1 ~xk=k 2 ~xk=k wk

yk Hkxk vk k 3 ~xk=k1 6

with

Fk rfkx^k=k, k rfw x^k=k

,

Hk rhk x^k=k1

, k f x^k=k Fkx^k=k, and

k h x^k=k1 Hkx^k=k1

x^k1 Fkx^k kwk k k 7

yk Hkxk vk k Xk

8

where

k 1 ~xk=k 2 ~xk=k wk 9

and

Xk

3 ~xk=k1 10

These inputs must satisfy these norm bounds

Yk

2

2

421 ~xk=k 2

222 wkk k22 11

and

kk k22423 ~xk=k 2

212

Instead of solving for the nonlinear process andobservation models, which contain the extra termsk and k that are not used in the EKF formulation,the following scaled H 1 problem is considered

x^k1 Fkx^k kcvwk k 13yk Hkxk cwvk k 14

where c2w 1 221 223 and c2v c2w 1 21 1

.This nal form, equations (13) and (14), results tothe same structure of the EKF7 except that the errorcovariance correction of the linear H 1 lter is usedwith the noise process wk and vk scaled by cw and cv.

Nonlinear H1 filterThe nal NH1 formulation is then written in aPredictor-Corrector scheme:

Initialization:

Q^k c2wQkR^k c2vRk

Predictor:

x^k=k1 f x^k1=k1 , uk1, 0

Pk=k1 Fk1Pk1=k1FTk1 k1Q^kTk1

Corrector:

x^k=k x^k=k1 Kk yk Hkxk=k1Kk Pk=k1HkHkPk=k1HTk R^k1

k=k Pk=k1 2I Pk=k1HTkHkPk=k1 HkPk=k1HTk R^k

" #

Pk=k Pk=k1 Pk=k1 I HTk

k=kIHTk

" #Pk=k1

Cooperative VSLAM

In the multi-vehicle VSLAM problem proposed inthis study, the estimated state vector becomes the pos-ition, velocity and attitude of the multiple vehicles andthe positions of point features in the environment. Inthis section, we extend the single VSLAM, presentedabove, to the multiple vehicle case.


The state vector and Jacobian matrices are given by

State vector

Xk

Xuav1,k

..

.

XuavM,kmn1,kmn2,k

..

.

mnN,k

266666666664

377777777775

Jacobian matrix of f w.r.t

XFsys

Fuav1,k 099 093N099 . .

.099 ..

.

..

.099 FuavM,k 093N

03N9 03N9 I3N3N

266664

377775

Jacobian matrix of f w.r.t wk

sys uav1 096 096

096 . ..

096096 096 uavM03N6 03N6 03N6

2664

3775

Jacobian matrix of h w.r.t X Hsys 049i1 Huav i049Mi 049 j1 Hfeature j 049Nj for the ithvehicle and the jth feature.

Jacobian matrix of f w.r.t vk Vsys I44

Figure 2 shows a diagram of our cooperativeVSLAM concept. The essential step in this architec-ture is the Observation model as a good cooperationis obtained when the shared region among the UAVsis large.

To detail our C-VSLAM concept, assume wehave N UAVs navigating in outdoor environment,and Mi is the number of features observed by theith UAV at time t k. The C-VSLAM algorithmwill run centrally at the ground station and commu-nicates the position and the map to each UAV asfollows:

At t 0 the UAVs positions, velocities andorientations are known as well as the covariancematrix. During navigation, each UAV observes aset of features that can be divided into three types(Figure 3):

Type 1: feature re-observed (has been observedby the same UAVi).

Type 2: feature re-observed (has been observedby other UAVj and j 6 i).

Type 3: new feature observed for the first time.

Features of Type 1 and Type 2 will be used toupdate the map and the UAVs states, where featuresof Type 3 will be initialized and added to the mapusing the inverse model of observation.31 When anew feature is observed by more than one UAVthen it will be initialized more accurately (red featurein Figure 3).

Figure 2. Cooperative VSLAM architecture.

VSLAM: visual simultaneous localization and mapping.

Nemra and Aouf 5

C-VSLAM based on NH? lter is summarizedbelow:

Feature observation and matching

Computer vision becomes vital for automatic percep-tion of the environment especially for mapping inautonomous vehicle applications. Stereo system isan attractive source of information for machine per-ception because it leads to direct range measurements,and unlike monocular approaches, does not merelyinfer depth or orientation through the use of photo-metric and statistical assumptions.

The observation model proposed in this paper isbased on stereo vision cameras and is detailed inNemra and Aouf.32 The observation model, linkingthe perceived visual landmarks to the SLAM statevector is given by

u1 mc111x

nmi mc112ynmi mc113znmi mc114

mc131xnmi mc132ynmi mc133znmi mc134

v1 mc121x



u2 mc211x



v2 mc221x



8>>>>>>>>>>>>>>>>>>>>>>>:

19

where xnmi ynmi znmi T are the coordinates of the land-mark mi in the navigation frame (north east down,NED). mc1ij and m

c2ij are the projection matrix compo-

nents of camera right and camera left, respectively.32

Adaptive SIFT

Despite SIFT algorithm44 has many advantages; thedetected keypoints are still not ecient especially forvisual SLAM problem. SIFT algorithm uses a con-stant factor k between the scales, however a bigvalue of k implies few keypoints, which is a disadvan-tage for the VSLAM problem especially if these key-points are unstable. Also small values of k impliesdetection of lot of keypoints, which is a problem forVSALM too. Clearly it is not evident to nd the suit-able value of k for all kind of images.

Lowe44 proposes an adequate value of k 1:6,which is ecient only with images with appropriateamount of texture. For example, in low frequencyimages (seas, deserts. . .) only few features are detected.On the other hand, in high frequency images (spatial,building. . .) SIFT algorithm leads to lots of features,which are not suitable for VSLAM. As a solution tothis problem, an adaptive scale representation with an


adaptive factor k was introduced to propose our adap-tive scale invariant feature transform (ASIFT)32,44 thatwill be used in this work. The scale factor is estimatedbased on the energy of the DoG image.

Stereo vision constraints

In the last decade, a lot of computer vision work suchas stereo-ego motion and image mosaic are proposedbased on the extraction and matching of invariantfeatures in stereo images or successive images.45

Suppose pi and pj are two keypoints in image i andimage j, with their descriptors Di and Dj, respectively.Then, the distance between the two descriptors isgiven by the Euclidian distance:

Distancei, j Di Dj

2

Di Dj T

: Di Dj q

It is proposed that pk is the correspondent of pi ifand only if:

Dmin 1 Distancei, k minDistancei, j withj 1 M the lowest Distance

Dmin 2 minDistancei, j with j 1 Mand j 6 k the 2nd lowest Distance

Dmin 15DistRatio Dmin 2

8>>>>>>>:

M: keypoint number in the second image.DistRatio: is a factor 2 0, 1 used to avoid the

problem of similar descriptors.For example if DistRatio 1 this means that any

keypoint pi have a correspondent pk even if Dmin 1 Dmin 2 (the descriptor of the two keypoints are almostsimilar). This kind of association is not suitablebecause it is very sensitive to noise. To increase therobustness of our association approach we shoulddecrease the value of DistRatio.

This manner to nd feature correspondence is com-putationally heavy and many useless distances arecalculated. For example, if we assume Mi number offeatures in image Ii and Mj number of feature inimage Ij then the complexity of the matching algo-rithm using Euclidian distance is Mi Mj128 127 summations andMi Mj 128 multiplica-tions. This huge number of operations which shouldrun between two frames makes the matching algo-rithm time-consuming and not very suitable for realtime applications.

In this part of our work, we propose an approachto reduce the complexity of the feature matching pro-cess using stereo vision constraints. Rather than look-ing for feature correspondence in the entire image, thearea of search will be reduced to regions of interest

Figure 3. Features in C-VSLAM algorithm.

C-VSLAM: cooperative visual simultaneous localization and mapping.

Nemra and Aouf 7

(ROIs) limited by the horizontal and vertical dispari-ties (HD and VD, respectively). To nd the corres-pondent of a feature pixi, yi in image Ii, we willlook into the region of centre xi, yi at Ij limited bythe vertical and horizontal maximum disparity, VDand HV respectively, as shown in Figure 4. Good fea-ture matching is obtained with signicant reduction ofthe computation time is shown in Figure 5 withDistRatio 0:1.

Disparity analysis

Maximum horizontal disparity HDmax. HDmax can be esti-mated as follows: Assume a point M of coordinatesx, y, z in the world frame and its projection on theimage left and right planes are ul, vl and ur, vrrespectively. f is the focal distance and b is the baseline(distance between the two cameras). By triangle simi-larity as shown in Figure 6, we get

f

z ul

x urx b 20

f

z ul ur

b21

Since the disparity is dened by

d ul ur 22

Figure 4. Features correspondence.

Figure 5. Feature matching.

Figure 6. Triangulation principal.


From equations (21) and (22), we obtain

z f bd

23

From equation (23) and for calibrated stereo-cam-eras, the depth z of the point M depends only on thedisparity. From the same equation, we can concludethat the maximum disparity is linked with the min-imum depth HDmax dmax fbzmin where zmin is thedepth of the nearest point observed by both cameras(left and right).

Maximum vertical disparity VDmax. The estimation of thevertical disparity is based on the epipolar geometry.Assume a feature pl with coordinate ul, vl in the leftimage and its correspondent in the right image pr withcoordinate ur, vr. Then, the equation of the epipolarline is given by

pTr F pl 0 24where F is the fundamental matrix, given by

F M1r TE Ml 25

Ml and Mr are the intrinsic parameter matrix forcamera left and camera right, respectively. E is theessential matrix.46

If we put F f11 f12 f13f21 f22 f23f31 f32 f33

24

35, then equation (20)

becomes

ulurf11 ulvrf21 ulf31 vlurf12 vlvrf22 vlf32 urf13 vrf23 f33 0 26

and the epipolar equation will be given by

vr A ur B 27

A ulf11 vlf12 f13ulf21 vlf22 f23 28

B ulf31 vlf32 f33ulf21 vlf22 f23 29

ul HDmax24ur4ul HDmax

230

A ul HDmax2

B4Aur B4A ul HDmax

2

B

31

A ul HDmax2

B4vr4A ul HDmax

2

B

32Then, we conclude that maximum vertical disparity

is given by

VDmax A ul HDmax2

A ul HDmax

2

33

VDmax A HDmax 34

An example using DistRatio 0:2 is illustrated inFigures 7 to 9. Figure 7 shows the extracted fea-ture using adaptive SIFT with a disparity window.A landmark from image left got 06 candidates onlyrather than 245 (number of features extracted in leftimage). Figures 8 and 9 show the results of featuresmatching.

If the left and right images are rectied then thematching problem becomes easier because in this casethe vertical disparity VD 0. Thus, the area of searchof the feature will be reduced to one dimension spacelimited by the horizontal disparity HD.

Simulation results

This section presents simulation results of C-VSLAMinvolving two UAVs. Each UAV has its own IMUand stereo vision cameras. The C-VSLAM algorithmis simulated to run centrally at the command stationwhile communicating the position and the map toeach UAV.

Figure 7. Features extraction. (a) Right image 241 landmarks and (b) left image 245 landmarks.

Nemra and Aouf 9

In Figure 10, curves in the left (right) side show theposition of the UAV1 (UAV2) in the x-, y- and z-axes.As can be seen, X-position (Figure 10(a) and (d)) andY-position (Figure 10(b) and (e)) are estimated withsignicant accuracy. This can be explained by the factthat cameras or stereo vision system can provide pre-cise bearing information. This is not completely thecase for the range information where the stereo visionsystem provides less accurate Z-position as shown inFigure 10(c) and (f)). On the above gures, we canalso observe the eect of loop closing detection onUAV1 at t 200 s, as well as the precision improve-ment obtained when UAV2 visits features already vis-ited by the UAV1 at t1 80 s and at t2 150 s.

Figure 11 shows the trajectories of the two UAVsin the x- and y-axes. While UAV1 (Red) closesits loop at t 200 s, (dashed square, the red UAVvisit the same area for the second time), UAV2(Blue) does not make any loop closing but it visitsmany features already visited by UAV1 (dashedellipses, the blue UAV visits area already visited bythe red UAV).

Figure 12 presents the evolution of the uncertain-ties for six features from the global map. As shown,

the uncertainty of each feature decreases with time. Att 200 s a signicant decrease of the uncertainty isobserved and this is justied by the loop closing detec-tion, which improves the consistency of the estimatorat that time.

Figure 13 shows a comparison between singleand cooperative UAV VSLAM in simulation. Theestimation of the UAV1 Z-position with a singleVSLAM, even if it still much better than theINS position, leads to an increasing error withtime if no loop closure is detected. On the otherhand, the estimation of the UAV1 Z-positionwith cooperative VSLAM provides more accurateposition.

Experimental results

Experimental indoor results

Experimental validation was arranged based oncooperative mobile robots (Pioneer3 AT). Withoutlosing generality in the validation process fromground robots (3DOF) to aerial vehicles (6DOF),the use of ground vehicles for experimental validation

Figure 8. Features correspondence.

Figure 9. 64 good matching.


0 50 100 150 200 2505

0

5

10

15

20

25

30

Time (s)

)m(

noiti

so p

X1V

AU

True positionINS positionVSLAM position

0 50 100 150 200 2500

5

10

15

20

25

30

35

Time (s)

)m(

noitis

opX

2VAU


0 50 100 150 200 25050

30

10

10

30

50

Time (s)

UAV1

Y p

ositi

on (m

)


0 50 100 150 200 25050

40

30

20

10

0

10

20

30

Time (s)

UAV2

Y p

ositi

on (m

)


0 50 100 150 200 2506

4

2

0

2

4

6

8

Time (s)

UAV1

Z p

ositi

on (m

)

True positionINS positionVSLAM osition

0 50 100 150 200 2506

4

2

0

2

4

6

8

Time (s)

UAV2

Z p

ositi

on (m

)


(a)

(b) (e)

(d)

(f)(c)

Figure 10. UAVs positions left: XYZ position of UAV1 in navigation frame; right: XYZ position of UAV2 in navigation frame.

UAV: unmanned aerial vehicle.

Nemra and Aouf 11

Figure 11. UAV1 and UAV2 True, INS and corrected position.

UAV: unmanned aerial vehicle; INS: inertial navigation system.

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Time (s)

Feat

ures

unc

erta

intie

s (m

)

Feature 1Feature 2Feature 3Feature 4Feature 5Feature 6

Figure 12. Feature uncertainties.


was dictated by the unavailability of a multipleaerial vehicle setup. In this experiment,Odometer pose, instead of the UAV full INS infor-mation, was fused with stereo vision data. Articialnoises were added to the odometry data of the robotto emulate well drifts appearing for in theodomtery sensors with time and for long trajectories,which is not our case here especially for indoorexperiments.

Figure 14 shows the state variables (X, Y and ) ofthe two mobile robots considered in this experiment.Our vehicle has three degree of freedom rather thansix as for aerial vehicles. However, the observationsystem is still the same as presented in sectionFeature observation and matching.

Figure 15 presents the interface that is built andrun centrally on our command computer and basedon the communication data transmitted by the twocooperative robots. This gure contains the twoimages left and right observed by the rst robot andthe second robot with features extracted (green point)and tracked (red point). In the bottom we have theestimation trajectories of each robot, odometer pos-ition (red), true position (dashed), CVSLAM position(green). Statistics about the number of extracted andassociated features for both robots are given in the leftbottom corner of the interface.

Figure 16 shows the results of the experimentalpose estimation of the two cooperative robots navi-gating indoor. Good pose estimation (position X, Y

0 50 100 150 200 250

14

12

10

8

6

4

2

0

2

4

6

8

Time (s)

UAV

Z1 p

ositi

on (m

)

True positionINS positionSingle UAV Visual SLAM positionMultiple UAV Visual SLAM position

Figure 13. Z estimation single VSLAM vs C-VSLAM.

VSLAM: visual simultaneous localization and mapping; C-VSLAM: cooperative visual simultaneous localization and mapping.

Figure 14. Mobile robot representation in navigation frame.

Nemra and Aouf 13

and orientation ) is obtained by the C-VSLAM algo-rithm comparing to the odometer pose for the tworobots.

Comparison between experimental single VSLAMand C-VSLAM is given in Figure 17. As it is shown,navigation positions (X and Y) estimated by singleVSLAM is less accurate comparing to theC-VSLAM estimates that maintain a good precision.In fact a zoom on the rst instants of robot travelsshows that C-VSLAM and single VSLAM look closeto each others. However, after a little travellingC-VSLAM navigation results outperform single navi-gation results. This is mainly due to the fact that shar-ing visual features augments the reliability androbustness of the estimation process by reducing per-ception uncertainties. In addition to this, loop closingeects is apparent on both single VSLAM andC-VSLAM by looking carefully on Figure 17 attime 70 s.

Experimental outdoor results

Experimental outdoor validation was also conductedin order to verify the robustness of our approach in ascenario when environment conditions are notcontrolled.

Figure 18 shows the results of pose estimationusing experimental data of two mobiles robot navigat-ing in outdoor environment. As can be seen good poseestimation (position X, Y and orientation ) is

obtained by the C-VSLAM algorithm comparing tothe odometer pose.

Experimental results of the images mosaic usingC-VSLAM algorithm in outdoor environment are pre-sented in Figures 19 to 21. Figures 19 and 20 show theimage mosaics built using mobile robot 1 and 2,respectively. Figure 21 shows the image mosaic builtin the command computer using both robot scene per-ceptions. From these gures, it is clear that cooperatingrobots will have access to a larger map than each robotalone, which is very important to explore large areas inoutdoor environments.

Conclusion

In this article, we proposed a robust approach tosolve the cooperative airborne VSLAM problembased on the development of a full stereo cameraobservation model. An adaptive SIFT feature extrac-tor followed by stereo vision constraints fastmatching were introduced to construct a large map.Robust C-VSLAM is implemented based on theNH1 lter and compared with the single VSLAM.The proposed C-VSLAM solution was validatedusing simulation and experimental data andgood and very promising results were obtained.Future work will include extending this cooperativestrategy in a distributed estimation framework foran experimental setup based on a set of cooperativeUAVs.

Figure 15. Computer real time interface for C-VSLAM.

C-VSLAM: cooperative visual simultaneous localization and mapping.


Figure 16. Robots positions, XY position of Robot 1 in navigation frame XY position of Robot 2 in navigation frame.

Nemra and Aouf 15

Figure 17. XY mobile robot localization single VSLAM vs cooperative VSLAM.VSLAM: visual simultaneous localization and mapping.


Figure 19. Image mosaic constructed by Robot 1. Figure 20. Image mosaic constructed by Robot 2.

Figure 18. Robots positions left: XY position of Robot 1 in navigation frame; right: XYy position of Robot 2 in navigation frame.

Nemra and Aouf 17

Funding

This research received no specic grant from any fundingagency in the public, commercial, or not-for-prot sectors.

REFERENCES

1. Cox IJ and Wilfong GT. Autonomous robot vehicles.New York: Springer Verlag, 1990.

2. Toledo-Moreo R, Zamora-Izquierdo MA, Ubeda-

Minarro B, et al. High-integrity IMM-EKF-basedroad vehicle navigation with low-cost GPS/SBAS/INS. IEEE Trans Intell Trans Syst 2007; 8(3).

3. Leonard JJ and Durrant-Whyte HF. Directed sonar sen-sing for mobile navigation. Boston: Kluwer AcademicPublisher, 1992.

4. Toth C. Sensor integration in airborne mapping. IEEE

Trans Instrum Meas 2002; 51(6): 13671373.5. Lacroix S, Mallat A, Chatilla R, et al. Rover self local-

ization in planetry-like environments. In: 5th

International symposium on artificial intelligence,robotics and automation in space, Noordwijk: TheNetherlands, June 1999.

6. Olson CF. Probabilstic self-localization for mobilerobots. IEEE Trans Robot Automat 2000; 16(1): 5566.

7. Ollero A and Merino L. Control and perception tech-niques for aerial robotics. Ann Revs Contr 2004; 28(2):

167178.8. Montemerlo M, Thrun S, Koller D, et al. FastSLAM:

A factored solution to the simultaneous localization

and mapping problem. In: Proceedings of the AAAInational conference on artificial intelligence,Edmonton, Canada, 2002.

9. Grabowski R, Navarro-Sement LE, Paredis CJJ, et al.Heterogenous teams of modular robots for mappingand exploration. Autonom Robot 2000; 8(3): 293308.

10. Singh H, Catipovic J, Eastwood R, et al. An integratedapproach to multiple AUV communications, navigationand docking. In: Proceedings of the IEEE oceanic engi-neering society, Florida, US: Fort Lauderdale, 1996,

pp. 5964.11. Kim JH, DeFilipps JM, Impert NP, et. al. ATM net-

work based integrated battle space simulation with mul-

tiple avawacs-fighter platforms. In: Proceedings of theIEEE military communications conference, vol. 1, IEEE,Boston, 1998, pp.101107.

12. CohenW.Adaptive mapping and navigation by teams ofsimple robots. Robot Autonom Syst 1996; 18: 411434.

13. Nettleton EW, Durrant-Whyte HF, Gibbens PW, et al.

Multiple platform localisation and map building. In:Proceedings of the sensor fusion and decentralizedcontrol in robotic systems III, vol. 4196, Boston, 2000,pp.337347.

14. CohenW.Adaptivemapping and navigation by teams ofsimple robots. Robot Autonom Syst 1996; 18: 411434.

15. Sty K. Using situated communication in distributed

autonomous mobile robots. In: Seventh Scandinavianconference on artifcial intelligence (SCAI01),Amsterdam, 2001.

16. Mourikis AI and Roumeliotis SI. Performance analysisof multirobot cooperative localization. IEEE TransRobot 2006; 22(4): 666681.

17. Roumeliotis SI and Rekleitis IM. Propagation of uncer-tainty in cooperative multirobot localization: analysisand experimental results. Autonom Robot 2004; 17(1):4154.

18. Burgard W, Fox D and Thrun S. Active mobile robotlocalization by entropy minimization. In: Proceedings ofsecond euromicro workshop on advanced mobile robots

(EUROBOT97), Brescia, 1997.19. Rochaa R, Diasa J and Carvalhob A. Cooperative

multi-robot systems: a study of vision-based 3-D map-

ping using information theory. Robot Autonom Syst2005; 53(3-4): 282311.

20. Stachniss C, Grisetti G, Hahnel D, et al. Improved

Rao-Blackwellized mapping by adaptive sampling andactive loop-closure. In: Proceedings of the workshop onself-organization of adaptive behaviour, SOAVE,Ilmenau, Germany, 2004, pp.115.

21. Triebel R and Burgard W. Improving simultaneousmapping and localization in 3D using global con-straints. In: Proceedings of the national conference on

artificial intelligence, AAAI, Pittsburgh, Pennsylvania,2005.

22. Grejner-Brzezinska DA, Toth CK and Sun HX, et al. A

robust solution to high-accuracy geolocation: quadru-ple integration of GPS, IMU, pseudolite, and terrestriallaser scanning. IEEE Trans Instrum Meas 2011; 60(11):36943708.

23. Schleicher D, Bergasa LM and Ocana M, et al. Real-time hierarchical outdoor SLAM based on stereovisionand GPS fusion. IEEE Trans Intell Trans Syst 2009;

10(3).24. Sappa AD, Dornaika F and Ponsa D, et al. An effi-

cient approach to onboard stereo vision system pose

estimation. IEEE Trans Intell Trans Syst 2008; 9(3):476490.

25. Wu M, Huang F, Wang L, et al. Cooperative multi-

robot monocular-SLAM using salient landmarks. In:IEEE international asia conference on informatics in con-trol, automation and robotics, IEEE, Bangkok, 2009,pp.151155.

26. Gil A, Reinoso O and Ballesta M, et al. Multi-robotvisual SLAM using a Rao-Blackwellized particle filter.Robot Autonom Syst Robot Autonom Syst 2010; 58(1):

6880.27. Martinelli A, Pont F and Siegwart R. Multi-robot local-

ization using relative observations. In: IEEE interna-

tional conference on robotics and automation,Barcelona, Spain, 2005.

28. Trawny N and Barfoot T. Optimized motion strategiesfor cooperative localization of mobile robots. In:

Proceedings of the IEEE international conference onrobotics and automation, vol. 1, New Orleans, US,2004, pp.10271032.

29. Wu M, Huang F, Wang L, et al. Cooperative multi-robot, monocular-SLAM using salient landmarks.

Figure 21. Image mosaic constructed by Robot 1 and 2.


In: Proceedings of the international asia conference oninformatics in control, automation and robotics,Bangkok, Thailand, 2009, pp.151155.

30. Ergin Ozkucur N and Levent Akn H. Cooperativemulti-robot map merging using fast-SLAM. In:Proceedings of the robocup international symposium

2009, vol. 5949, Springer Berlin/Heidelberg, 2009,pp.449460.

31. Dissanayake MWMG, Newman P and Clark S, et al.

A solution to the simultaneous localization and mapbuilding (SLAM) problem. IEEE Trans RobotAutomat 2001; 17(3): 229241.

32. Nemra A and Aouf N. Robust airborne 3D visual sim-

ultaneous localization and mapping with observabilityand consistency analysis. J Intell Robot Syst, DOI:10.1007/s10846-008-9306-6.

33. Fox D, Burgard W and Kruppa H, et al. A probabilisticapproach to collaborative multi-robot localization.Autonom Robot 2000; 8: 325344.

34. Thrun S. A probabilistic online mapping algorithm forteams of mobile robots. Int J Robot Res 2001; 20(5):335363.

35. Dellaert F and Kaess M. Square root SAM: simultan-eous localization and mapping via square root informa-tion smoothing. Int J Robot Res 2006; 25(12):11811203.

36. Kaess M, Ranganathan A and Dellaert F. iSAM: incre-mental smoothing and mapping. IEEE Trans Robot2008; 24(6): 13651378.

37. Andersson LAA and Nygards J. C-SAM: multi-robotSLAM using square root information smoothing.

In: IEEE international conference on robotics and auto-mation, Pasadena, CA, USA, 2008, pp.27982805.

38. Nemra A and Aouf N. Robust nonlinear filtering for

INS/GPS UAV localization. In: IEEE mediterraneanconference on control and automation, Ajaccio, June2008, pp.695702.

39. Ronnback S. Development of an INS/GPS navigationloop for an UAV, Masters Thesis, Lulea TekniskaUniversitet, 2000.

40. Shaked U and Berman N. H1 nonlinear filtering ofdiscrete-time process. IEEE Trans Signal Process1995; 43: 22052209.

41. Petersen I and Savkin A. Robust Kalman filtering for

signals and systems with large uncertainties. Boston:Birkhauser, 1999.

42. Basar T and Baernard P. H1 optimal control andrelated minimax design problems: a dynamic gameapproach. Boston: Birkhauser, 1991.

43. Einicke G and White L. Robust extended Kalman fil-

tering. IEEE Trans Signal Process 1999; 47(9):25962599.

44. Lowe DG. Distinctive image features from scale-

invariant keypoints. Int J Comput Vision 2004; 60(2):91110.

45. Nemra A and Aouf N. Robust feature extractionand correspondence for UAV map building. In: IEEE

mediterranean conference on control and automation,2009.

46. Ma Y, Soatto S and Kosecka J, et al. An invitation to 3-D

vision: from images to geometric models. InterdisciplinaryApplied Mathematics. New York: Springer, 2003.

Nemra and Aouf 19

Documents

Robust Cooperative Visual Localization With Experimental Validation for Unmanned Aerial Vehicles