18
Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2009, Article ID 387308, 18 pages doi:10.1155/2009/387308 Research Article Vision-Based Unmanned Aerial Vehicle Navigation Using Geo-Referenced Information Gianpaolo Conte and Patrick Doherty Artificial Intelligence and Integrated Computer System Division, Department of Computer and Information Science, Link¨ oping University, 58183 Link¨ oping, Sweden Correspondence should be addressed to Gianpaolo Conte, [email protected] Received 31 July 2008; Revised 27 November 2008; Accepted 23 April 2009 Recommended by Matthijs Spaan This paper investigates the possibility of augmenting an Unmanned Aerial Vehicle (UAV) navigation system with a passive video camera in order to cope with long-term GPS outages. The paper proposes a vision-based navigation architecture which combines inertial sensors, visual odometry, and registration of the on-board video to a geo-referenced aerial image. The vision-aided navigation system developed is capable of providing high-rate and drift-free state estimation for UAV autonomous navigation without the GPS system. Due to the use of image-to-map registration for absolute position calculation, drift-free position performance depends on the structural characteristics of the terrain. Experimental evaluation of the approach based on oine flight data is provided. In addition the architecture proposed has been implemented on-board an experimental UAV helicopter platform and tested during vision-based autonomous flights. Copyright © 2009 G. Conte and P. Doherty. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. Introduction One of the main concerns which prevents the use of UAV systems in populated areas is the safety issue. State-of-the- art UAVs are still not able to guarantee an acceptable level of safety to convince aviation authorities to authorize the use of such a system in populated areas (except in rare cases such as war zones). There are several problems which have to be solved before unmanned aircraft can be introduced into civilian airspace. One of them is GPS vulnerability [1]. Autonomous unmanned aerial vehicles usually rely on a GPS position signal which, combined with inertial mea- surement unit (IMU) data, provide high-rate and drift-free state estimation suitable for control purposes. Small UAVs are usually equipped with low-performance IMUs due to their limited payload capabilities. In such platforms, the loss of GPS signal even for a few seconds can be catastrophic due to the high drift rate of the IMU installed on-board. The GPS signal becomes unreliable when operating close to obstacles due to multipath reflections. In addition, jamming of GPS has arisen as a major concern for users due to the availability of GPS jamming technology on the market. Therefore UAVs which rely blindly on a GPS signal are quite vulnerable to malicious actions. For this reason, this paper proposes a navigation system which can cope with GPS outages. The approach presented fuses information from inertial sensors with information from a vision system based on a passive monocular video camera. The vision system replaces the GPS signal combining position information from visual odome- try and geo-referenced imagery. Geo-referenced satellite or aerial images must be available on-board UAV beforehand or downloaded in flight. The growing availability of high- resolution satellite images (e.g., provided by Google Earth) makes this topic very interesting and timely. The vision-based architecture developed is depicted in Figure 2 and is composed of an error dynamics Kalman filter (KF) that estimates the navigation errors of the INS and a separate Bayesian filter named point-mass filter (PMF) [2] which estimates the absolute position of the UAV on the horizontal plane fusing together visual odometry and image registration information. The 2D position estimated from the PMF, together with barometric altitude information obtained from an on-board barometer, is used as position measurement to update the KF.

Vision-BasedUnmannedAerialVehicleNavigation UsingGeo ......For indoor robotic applications SLAM is already a standard localization technique. More challenging is the use of such technique

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Hindawi Publishing CorporationEURASIP Journal on Advances in Signal ProcessingVolume 2009, Article ID 387308, 18 pagesdoi:10.1155/2009/387308

    Research Article

    Vision-Based Unmanned Aerial Vehicle NavigationUsing Geo-Referenced Information

    Gianpaolo Conte and Patrick Doherty

    Artificial Intelligence and Integrated Computer System Division, Department of Computer and Information Science,Linköping University, 58183 Linköping, Sweden

    Correspondence should be addressed to Gianpaolo Conte, [email protected]

    Received 31 July 2008; Revised 27 November 2008; Accepted 23 April 2009

    Recommended by Matthijs Spaan

    This paper investigates the possibility of augmenting an Unmanned Aerial Vehicle (UAV) navigation system with a passive videocamera in order to cope with long-term GPS outages. The paper proposes a vision-based navigation architecture which combinesinertial sensors, visual odometry, and registration of the on-board video to a geo-referenced aerial image. The vision-aidednavigation system developed is capable of providing high-rate and drift-free state estimation for UAV autonomous navigationwithout the GPS system. Due to the use of image-to-map registration for absolute position calculation, drift-free positionperformance depends on the structural characteristics of the terrain. Experimental evaluation of the approach based on offlineflight data is provided. In addition the architecture proposed has been implemented on-board an experimental UAV helicopterplatform and tested during vision-based autonomous flights.

    Copyright © 2009 G. Conte and P. Doherty. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

    1. Introduction

    One of the main concerns which prevents the use of UAVsystems in populated areas is the safety issue. State-of-the-art UAVs are still not able to guarantee an acceptable levelof safety to convince aviation authorities to authorize theuse of such a system in populated areas (except in rare casessuch as war zones). There are several problems which have tobe solved before unmanned aircraft can be introduced intocivilian airspace. One of them is GPS vulnerability [1].

    Autonomous unmanned aerial vehicles usually rely ona GPS position signal which, combined with inertial mea-surement unit (IMU) data, provide high-rate and drift-freestate estimation suitable for control purposes. Small UAVsare usually equipped with low-performance IMUs due totheir limited payload capabilities. In such platforms, the lossof GPS signal even for a few seconds can be catastrophic dueto the high drift rate of the IMU installed on-board. The GPSsignal becomes unreliable when operating close to obstaclesdue to multipath reflections. In addition, jamming of GPShas arisen as a major concern for users due to the availabilityof GPS jamming technology on the market. Therefore UAVs

    which rely blindly on a GPS signal are quite vulnerable tomalicious actions. For this reason, this paper proposes anavigation system which can cope with GPS outages. Theapproach presented fuses information from inertial sensorswith information from a vision system based on a passivemonocular video camera. The vision system replaces the GPSsignal combining position information from visual odome-try and geo-referenced imagery. Geo-referenced satellite oraerial images must be available on-board UAV beforehandor downloaded in flight. The growing availability of high-resolution satellite images (e.g., provided by Google Earth)makes this topic very interesting and timely.

    The vision-based architecture developed is depicted inFigure 2 and is composed of an error dynamics Kalmanfilter (KF) that estimates the navigation errors of the INSand a separate Bayesian filter named point-mass filter (PMF)[2] which estimates the absolute position of the UAV onthe horizontal plane fusing together visual odometry andimage registration information. The 2D position estimatedfrom the PMF, together with barometric altitude informationobtained from an on-board barometer, is used as positionmeasurement to update the KF.

  • 2 EURASIP Journal on Advances in Signal Processing

    Figure 1: The Rmax helicopter.

    Odometry and image registration are complementaryposition information sources. The KLT feature tracker [3] isused to track corner features in the on-board video imagefrom subsequent frames. A homography-based odometryfunction uses the KLT results to calculate the distancetraveled by the UAV. Since the distance calculated by theodometry is affected by drift, a mechanism which compen-sates for the drift error is still needed. For this purposeinformation from geo-referenced imagery is used.

    The use of reference images for aircraft localizationpurposes is also investigated in [4]. A reference imagematching method which makes use of the Hausdorff distanceusing contour images representation is explored. The workfocuses mainly on the image processing issues and gives lessemphasis to architectural and fusion schemes which are offocus in this paper.

    An emerging technique to solve localization problems isSimultaneous Localization and Mapping (SLAM). The goalof SLAM is to localize a robot in the environment whilemapping it at the same time. Prior knowledge of the envi-ronment is not required. In SLAM approaches, an internalrepresentation of the world is built on-line in the form ofa landmarks database. Such a representation is then usedfor localization purposes. For indoor robotic applicationsSLAM is already a standard localization technique. Morechallenging is the use of such technique in large outdoorenvironments. Some examples of SLAM applied to aerialvehicles can be found in [5–7].

    There also exist other kinds of terrain navigationmethods which are not based on aerial images but onterrain elevation models. A direct measurement of the flightaltitude relative to the ground is required. Matching theground elevation profile, measured with a radar altimeterfor example, to an elevation database allows for aircraftlocalization. A commercial navigation system called TERrainNAVigation (TERNAV) is based on such a method. Thenavigation system has been implemented successfully onsome military jet fighters and cruise missiles. In the case ofsmall UAVs and more specifically for unmanned helicopters,this method does not appear to be appropriate. Compared tojet fighters, UAV helicopters cover short distances at very lowspeed, and so the altitude variation is quite poor in terms ofallowing ground profile matching.

    INSmechanization

    12-statesKalman

    filter

    GPS

    Visualodometer

    Imagematching

    Inertialsensors

    Videocamera

    - UAV state

    Geo-referencedimage database

    Point-massfilter

    - Position update - Altitude- Attitude

    Barometricsensor

    Sub-system 2

    Sub-system 1

    Figure 2: Sensor fusion architecture.

    Advanced cruise missiles implement a complex navi-gation system based on GPS, TERNAV, and Digital SceneMatching Area Correlation (DSMAC). During the cruisephase the navigation is based on GPS and TERNAV. Whenthe missile approaches the target, the DSMAC is used toincrease the position accuracy. The DSMAC matches the on-board camera image to a reference image using correlationmethods.

    The terrain relative navigation problem is of great interestnot only for terrestrial applications but also for futurespace missions. One of the requirements for the next lunarmission in which NASA/JPL is involved is to autonomouslyland within 100 meters of a predetermined location on thelunar surface. Traditional lunar landing approaches basedon inertial sensing do not have the navigational precisionto meet this requirement. A survey of the different terrainnavigation approaches can be found in [8] where methodsbased on passive imaging and active range sensing aredescribed.

    The contribution of this work is to explore the possibilityof using a single video camera to measure both relativedisplacement (odometry) and absolute position (imageregistration). We believe that this is a very practical andinnovative concept. The sensor fusion architecture developedcombines vision-based information together with inertialinformation in an original way resulting in a light weightsystem with real-time capabilities. The approach presentedis implemented on a Yamaha Rmax unmanned helicopterand tested on-board during autonomous flight-test experi-ments.

    2. Sensor Fusion Architecture

    The sensor fusion architecture developed in this work ispresented in Figure 2 and is composed of several modules.It can work in GPS modality or vision-based modality if theGPS is not available.

  • EURASIP Journal on Advances in Signal Processing 3

    A traditional KF (subsystem 1) is used to fuse data fromthe inertial sensors (3 accelerometers and 3 gyros) with aposition sensor (GPS or vision system in case the GPS isnot available). An INS mechanization function performs thetime integration of the inertial sensors while the KF functionestimates the INS errors. The KF is implemented in the errordynamics form and uses 12 states: 3-dimensional positionerror, 3-dimensional velocity error, 3-attitude angle error,and 3 accelerometer biases. The error dynamics formulationof the KF is widely used in INS/GPS integration [9–11]. Theestimated errors are used to correct the INS solution.

    The vision system (subsystem 2) is responsible forcomputing the absolute UAV position on the horizontalplane. Such position is then used as a measurement updatefor the KF. The visual odometry computes the helicopterdisplacement by tracking a number of corner features in theimage plane using the KLT algorithm [3]. The helicopterdisplacement is computed incrementally from consecutiveimage frames. Unfortunately the inherent errors in thecomputation of the features location accumulate when oldfeatures leave the frame and new ones are acquired, produc-ing a drift in the UAV position. Such drift is less severe thanthe position drift derived from a pure integration of typicallow-cost inertial sensors for small UAVs as it will be shownlater in the paper. The image matching module providesdrift-free position information which is integrated in thescheme through a grid-based Bayesian filtering approach.The matching between reference and on-board image isperformed using the normalized cross-correlation algorithm[12].

    The architecture components will be described indetails in the following sections. Section 3 describes thehomography-based visual odometry, Section 4 presents theimage registration approach used, and Section 5 describesthe sensor fusion algorithms.

    3. Visual Odometry

    Visual odometry for aerial navigation has been an objectof great interest during the last decade [13–16]. The workin [13] has shown how a visual odometry is capable ofstabilizing an autonomous helicopter in hovering conditions.In that work the odometry was based on a template matchingtechnique where the matching between subsequent frameswas obtained through sum of squared differences (SSDs)minimization of the gray scale value of the templates.The helicopter’s attitude information was taken from anIMU sensor. Specialized hardware was expressly built forthis experiment in order to properly synchronize attitudeinformation with the video frame. Most of the recent work[14–16] on visual odometry for airborne applications isbased on homography estimation under a planar sceneassumption. In this case the relation between points of twoimages can be expressed as x2 ≈ Hx1, where x1 and x2 arethe corresponding points of two images 1 and 2 expressed inhomogeneous coordinates, and H is the 3 × 3 homographymatrix. The symbol ≈ indicates that the relation is validup to a scale factor. A point is expressed in homogeneous

    coordinates when it is represented by equivalence classesof coordinate triples (kx, ky, k), where k is a multiplicativefactor. The camera rotation and displacement between twocamera positions, c1 and c2, can be computed from thehomography matrix decomposition [17]:

    H = K(Rc2c1 +

    1d�t c2�nc1T

    )K−1, (1)

    where K is the camera calibration matrix determined with acamera calibration procedure, �t c2 is the camera translationvector expressed in camera 2 reference system, Rc2c1 is therotation from camera 1 to camera 2, �nc1 is the unit normalvector to the plane being observed and expressed in camera 1reference system, and d is the distance of the principal pointof camera 1 to the plane.

    The visual odometry implemented in this work is basedon robust homography estimation. The homography matrixH is estimated from a set of corresponding corner featuresbeing tracked from frame to frame. H can be estimated usingdirect linear transformation [17] with a minimum numberof four feature points (the features must not be collinear).In practice the homography is estimated from a highernumber of corresponding feature points (50 or more), andthe random sample consensus (RANSAC) algorithm [18] isused to identify and then discard incorrect feature correspon-dences. The RANSAC is an efficient method to determineamong the set of tracked features C the subset Ci of inliersfeatures and subset Co of outliers so that the estimate His unaffected by wrong feature correspondences. Details onrobust homography estimation using the RANSAC approachcan be found in [17].

    The feature tracker used in this work is based on the KLTalgorithm. The algorithm selects a number of features in animage according to a “goodness” criteria described in [19].Then it tries to reassociate the same features in the next imageframe. The association is done by minimizing the sum ofsquared differences over patches taken around the features inthe second image. Such association criteria gives good resultswhen the feature displacement is not too large. Therefore itis important that the algorithm has a low execution time.The faster the algorithm, the more successful the associationprocess.

    In Figure 3 the RANSAC algorithm has been appliedon a set of features tracked in two consecutive frames.In Figure 3(a) the feature displacements computed by theKLT algorithm are represented while in Figure 3(b) the setof outlier features has been detected and removed usingRANSAC.

    Once the homography matrix has been estimated, it canbe decomposed into its rotation and translation componentsin the form of (1) using singular value decomposition asdescribed in [17]. However, homography decomposition is apoorly conditioned problem especially when using cameraswith a large focal length [20]. The problem arises alsowhen the ratio between the flight altitude and the cameradisplacement is high. For this reason it is recommendableto use interframe rotation information from other sensorsources if available.

  • 4 EURASIP Journal on Advances in Signal Processing

    (a)

    (b)

    Figure 3: (a) displays the set of features tracked with the KLTalgorithm. In (b) the outlier feature set has been identified andremoved using the RANSAC algorithm.

    The odometry presented in this work makes use ofinterframe rotation information coming from the KF. Thesolution presented in the remainder of this section makesuse of the knowledge of the terrain slope; in other words thedirection of the normal vector of the terrain is assumed to beknown. In the experiment presented here the terrain will beconsidered nonsloped. Information about the terrain slopecould be also extracted from a database if available.

    The goal then is to compute the helicopter translationin the navigation reference system from (1). The coordinatetransformation between the camera and the INS sensoris realized with a sequence of rotations. The translationbetween the two sensors will be neglected since the lineardistance between them is small. The rotations sequence(2) aligns the navigation frame (n) to the camera frame(c) passing through intermediate reference systems namedhelicopter body frame (b) and camera gimbal frame (g) asfollows:

    Rnc = RnbRbgRgc , (2)

    where Rnb is given by the KF, Rbg is measured by the pan/tilt

    camera unit, and Rgc is constant since the camera is rigidly

    mounted on the pan/tilt unit. A detailed description of thereference systems and rotation matrices adopted here canbe found in [21]. The camera projection model used in

    this work is a simple pin-hole model. The transformationbetween the image reference system and the camera referencesystem is realized through the calibration matrix K :

    K =

    ⎡⎢⎢⎢⎣

    0 fx ox

    − fy 0 oy0 0 1

    ⎤⎥⎥⎥⎦, (3)

    where fx, fy are the camera focal lengths in the x and ydirections, and (ox oy) is the center point of the imagein pixels. The camera’s lens distortion compensation is notapplied in this work. If c1 and c2 are two consecutive camerapositions, then considering the following relationships:

    Rc2c1 = Rc2n Rnc1,

    �t c2 = Rc2n �tnT ,

    �nc1 = Rc1n �nnT(4)

    and substituting in (1) give

    H = KRc2n(I +

    1d�t n�nnT

    )Rnc1K

    −1. (5)

    Considering that the rotation matrices are orthogonal (insuch case the inverse coincides with the transposed), (5) canbe rearranged as

    �t n�nnT = d(Rc2n

    TK−1H∗KRnc1

    T − I)

    , (6)

    where, in order to eliminate the scale ambiguity, the homog-raphy matrix has been normalized with the ninth element ofH as follows:

    H∗ = HH3,3

    . (7)

    The distance to the ground d can be measured from an on-board laser altimeter. In case of flat ground, the differentialbarometric altitude, measured from an on-board barometricsensor between the take-off and the current flight position,can be used. Since in our environment the terrain isconsidered to be flat then �nn = [0, 0, 1]. Indicating withRHS the right-hand side of (6), the north and east helicopterdisplacements computed by the odometry are

    tnnorth = RHS1,3,tneast = RHS2,3.

    (8)

    To conclude this section, some considerations about theplanar scene assumption will be made. Even if the terrain isnot sloped, altitude variations between the ground level androof top or tree top level are still present in the real world.The planar scene assumption is widely used for airborneapplication; however a simple calculation can give a roughidea of the error being introduced.

    For a UAV in a level flight condition with the camerapointing perpendicularly downward, the 1D pin-hole cameraprojection model can be used to make a simple calculation:

    Δxp = f Δxcd

    (9)

  • EURASIP Journal on Advances in Signal Processing 5

    with f being the camera focal length, d the distanceto the ground plane, Δxp the pixel displacement in thecamera image of an observed feature, and Δxc the computedodometry camera displacement. If the observed feature is noton the ground plane but at a δd from the ground, the truecamera displacement Δxtc can be expressed as a function ofthe erroneous camera displacement Δxc as

    Δxtc = Δxc(

    1− δdd

    )(10)

    with � = δd/d being the odometry error due to the depthvariation. Then, if δd is the typical roof top height of anurban area, the higher the flight altitude d, the smaller theerror � is. Considering an equal number of features pickedon the real ground plane and on the roof tops, the referencehomography plane can be considered at average roof heightover the ground, and the odometry error can be divided by 2.If a UAV is flying at an altitude of 150 meters over an urbanarea with a typical roof height at 15 meters, the odometryerror derived from neglecting the height variation is about5%.

    4. Image Registration

    Image registration is the process of overlaying two imagesof the same scene taken at different times, from differ-ent viewpoints and by different sensors. The registrationgeometrically aligns two images (the reference and sensedimages). Image registration has been an active research fieldfor many years, and it has a wide range of applications.A literature review on image registration can be found in[22, 23]. In this context it is used to extract global positioninformation for terrain relative navigation.

    Image registration is performed with a sequence ofimage transformations, including rotation, scaling, andtranslation which bring the sensed image to overlay preciselywith the reference image. In this work, the reference andsensed images are aligned and scaled using the informationprovided by the KF. Once the images are aligned and scaled,the final match consists in finding a 2D translation whichoverlays the two images.

    Two main approaches can be adopted to image regis-tration: correlation-based matching and pattern matching. Inthe correlation-based matching the sensed image is placedat every pixel location in the reference image, and then,a similarity criteria is adopted to decide which locationgives the best fit. In pattern matching approaches on theother hand, salient features (or landmarks) are detected inboth images, and the registration is obtained by matchingthe set of features between the images. Both methods haveadvantages and disadvantages.

    Methods based on correlation can be implemented veryefficiently and are suitable for real-time applications. Theycan be applied also in areas with no distinct landmarks.However they are typically more sensitive to differencesbetween the sensed and reference image than pattern match-ing approaches.

    Methods based on pattern matching do not use imageintensity values directly. The patterns are information on ahigher level typically represented with geometrical models.This property makes such methods suitable for situationswhen the terrain presents distinct landmarks which arenot affected by seasonal changes (i.e., roads, houses). Ifrecognized, even a small landmark can make a large portionof terrain unique. This characteristic makes these methodsquite dissimilar from correlation-based matching wheresmall details in an image have low influence on the overallimage similarity. On the other hand these methods work onlyif there are distinct landmarks in the terrain. In addition, apattern detection algorithm is required before any matchingmethod can be applied. A pattern matching approach whichdoes not require geometrical models is the Scale InvariantFeature Transform (SIFT) method [24]. The reference andsensed images can be converted into feature vectors whichcan be compared for matching purposes. Knowledge aboutaltitude and orientation of the camera relative to the terrainis not required for matching. Correlation methods are ingeneral more efficient than SIFT because they do not requirea search over image scale. In addition SIFT features donot have the capability to handle variation of illuminationcondition between reference and sensed images [8, 25].

    This work makes use of a matching technique basedon a correlation approach. The normalized cross-correlationof intensity images [12] is utilized. Before performing thecross-correlation, the sensed and reference images are scaledand aligned. The cross-correlation is the last step of theregistration process, and it provides a measure of similaritybetween the two images.

    The image registration process is represented in theblock diagram in Figure 5. First, the sensed color imageis converted into gray-scale, and then it is transformed tothe same scale of the reference image. Scaling is performedconverting the sensed image to the resolution of the referenceimage. The scale factor s is calculated using (11)

    ⎛⎝sxsy

    ⎞⎠ =

    ⎛⎜⎜⎜⎝

    1fx

    1fy

    ⎞⎟⎟⎟⎠d · Ires, (11)

    where d, as for the odometry, is the UAV ground altitude, andIres is the resolution of the reference image. The alignment isperformed using the UAV heading estimated by the KF.

    After the alignment and scaling steps, the cross-correlation algorithm is applied. If S is the sensed imageand I is the reference image, the expression for the two-dimensional normalized cross-correlation is

    C(u, v) =∑

    x,y

    [S(x, y

    )− μS][I(x − u, y − v)− μI]√∑x,y [S(x, y)− μS]2

    ∑x,y [I(x − u, y − v)− μI]2

    ,

    (12)

    where μS and μI are the average intensity values of thesensed and the reference image, respectively. Figure 4 depictsa typical cross-correlation result between a sensed imagetaken from the Rmax helicopter and a restricted view of thereference image of the flight-test site.

  • 6 EURASIP Journal on Advances in Signal Processing

    350300250200

    150100500

    050

    100150

    200250

    300350−0.8−0.6−0.4−0.2

    00.20.40.6

    (a)

    (b)

    Figure 4: (a) depicts the normalized cross-correlation results, and(b) depicts the sensed image matched to the reference image.

    Gray-scaleconversion

    Gray-scaleconversion

    Image scaling

    Altitude

    Referenceimage

    resolution

    Image cropping

    Image alignment

    On-board (sensed)image

    Referenceimage

    Positionuncertainty

    Predictedposition

    Heading

    Normalisedcross-correlation

    Correlation map

    Figure 5: Image registration schematic.

    Image registration is performed only on a restrictedwindow of the reference image. The UAV position predictedby the filter is used as center of the restricted matching area.The purpose is to disregard low-probability areas to increasethe computational efficiency of the registration process. Thewindow size depends on the position uncertainty estimatedby the filter.

    5. Sensor Fusion Algorithms

    In this work, the Bayesian estimation framework is used tofuse data from different sensor sources and estimate the UAVstate. The state estimation problem has been split in twoparts. The first part is represented by a standard Kalman filter(KF) which estimates the full UAV state (position, velocity,and attitude); the second part is represented by a point-massfilter (PMF) which estimates the absolute 2D UAV position.The two filters are interconnected as shown in Figure 2.

    5.1. Bayesian Filtering Recursion. A nonlinear state-spacemodel with additive process noise �v and measurement noise�e can be written in the following form:

    �̇xt = f(�xt−1,�ut−1

    )+�v,

    �yt = h(�xt,�ut

    )+�e.

    (13)

    If the state vector �x is n-dimensional, the recursive Bayesianfiltering solution for such a model is

    p(�xt | �y1:t−1

    )=∫Rn

    pv(�xt− f

    (�xt−1,�ut−1

    ))p(�xt−1 | �y1:t−1

    )d�xt ,(14)

    αt =∫Rn

    pe(�yt − h

    (�xt,�ut

    ))p(�xt | �y1:t−1

    )d�xt, (15)

    p(�xt | �y1:t

    ) = α−1t pe(�yt − h(�xt,�ut))p(�xt | �y1:t−1). (16)The aim of recursion (14)–(16) is to estimate the posteriordensity p(�xt | �y1:t) of the filter. Equation (14) represents thetime update while (16) is the measurement update. pv andpe represent the PDFs of the process and measurement noise,respectively.

    From the posterior p(�xt | �y1:t), the estimation of theminimum variance (MV) system state and its uncertainty arecomputed from (17) and (18), respectively, as follows:

    �̂xMV

    t =∫Rn�xt p(�xt | �y1:t

    )d�xt , (17)

    Pt =∫Rn

    (�xt − �̂x

    MV

    t

    )(�xt − �̂x

    MV

    t

    )Tp(�xt | �y1:t

    )d�xt. (18)

    In general, it is very hard to compute the analytical solutionof the integrals in (14)–(18). If the state-space model (13) islinear, and the noises�v and�e are Gaussian and the prior p(�x0)normally distributed, the close form solution is representedby the popular Kalman filter.

  • EURASIP Journal on Advances in Signal Processing 7

    The vision-based navigation problem addressed in thiswork is non-Gaussian, and therefore a solution based onlyon Kalman filter cannot be applied. The problem is non-Gaussian since the image registration likelihood does nothave a Gaussian distribution (see Figure 4(a)).

    The filtering solution explored in this work is representedby a standard Kalman filter which fuses inertial data with anabsolute position measurement. The position measurementcomes from a 2-state point-mass filter which fuses thevisual odometry data with absolute position informationcoming from the image registration module (Figure 2). Theapproach has given excellent results during field trials, but ithas some limitations as it will be shown later.

    5.2. Kalman Filter. The KF implemented is the standardstructure in airborne navigation systems and is used toestimate the error states from an integrated navigationsystem.

    The linear state-space error dynamic model used in theKF is derived from a perturbation analysis of the motionequations [26] and is represented with the model (19)-(20)where system (19) represents the process model while system(20) represents the measurement model:

    ⎛⎜⎜⎜⎜⎝

    δ�̇rn

    δ�̇vn

    �̇�n

    ⎞⎟⎟⎟⎟⎠=

    ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

    Frr Frv 0

    Fvr Fvv

    0 −ad aead 0 −an−ae an 0

    Fer Fev −(�ωnin×

    )

    ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

    ⎛⎜⎜⎜⎝δ�r n

    δ�v n

    �� n

    ⎞⎟⎟⎟⎠+

    ⎡⎢⎢⎢⎣0 0

    Rnb 0

    0 −Rnb

    ⎤⎥⎥⎥⎦�u,

    (19)

    ⎛⎜⎜⎝

    ϕins − ϕvisionλins − λvision

    hins − (Δhbaro + h0)

    ⎞⎟⎟⎠ =

    [I 0 0

    ]⎛⎜⎜⎜⎝δ�r n

    δ�v n

    �� n

    ⎞⎟⎟⎟⎠ +�e (20)

    where δ�r n= (δϕ δλ δh)T are the latitude, longitude, andaltitude errors; δ�v n = (δvn δve δvd)T are the north, east,and down velocity errors; �� n = (�N �E �D)T are theattitude errors in the navigation frame; �u = (δ �f Tacc δ�ωTgyro)are the accelerometers and gyros noise; �e represents themeasurement noise; �ωnin is the rotation rate of the navigationframe; an, ae, ad are the north, east and vertical accelerationcomponents; Fxx are the elements of the system’s dynamicsmatrix.

    The KF implemented uses 12 states including 3accelerometer biases as mentioned before. However, thestate-space model (19)-(20) uses only 9 states withoutaccelerometer biases. The reason for using a reduced repre-sentation is to allow the reader to focus only on the relevantpart of the model for this section. The accelerometer biasesare modeled as first-order Markov processes. The gyrosbiases can be modeled in the same way, but they were notused in this implementation. The acceleration elements an,ae, and ad are left in the explicit form in the system (19) asthey will be needed for further discussions.

    It can be observed how the measurements coming fromthe vision system in the form of latitude and longitude(ϕvision, λvision) are used to update the KF. The altitudemeasurement update is done using the barometric altitudeinformation from an on-board pressure sensor. In orderto compute the altitude in the WGS84 reference systemrequired to update the vertical channel, an absolute referencealtitude measurement h0 needs to be known. For exampleif h0 is taken at the take-off position, the barometricaltitude variation Δhbaro relative to h0 can be obtained fromthe pressure sensor, and the absolute altitude can finallybe computed. This technique works if the environmentalstatic pressure remains constant. The state-space system, asrepresented in (19), (20), is fully observable. The elementsof the matrix Fer, Fev and (�ωnin×) are quite small as theydepend on the Earth rotation rate and the rotation rate ofthe navigation frame due to the Earth’s curvature. Theseelements are influential in the case of a very sensitive IMUor high-speed flight conditions. For the flight conditionsand typical IMU used in a small UAV helicopter, suchelements are negligible, and therefore observability issuesmight arise for the attitude angles. For example, in caseof hovering flight conditions or, more generally, in case ofnonaccelerated horizontal flight conditions, the elements anand ae are zero. It can also be observed that the angle error �D(third component of �� n) is not observable. In case of smallpitch and roll angles �D corresponds to the heading angle.Therefore, for helicopters that are supposed to maintain thehovering flight condition for an extended period of time, anexternal heading aiding (i.e., compass) is required. On theother hand, the pitch and roll angle are observable since thevertical acceleration ad is usually different from zero (exceptin case of free fall but this is an extreme case).

    It should be mentioned that the vision system uses theattitude angles estimated by the KF for the computationof the absolute position, as can be observed in Figure 2(subsystem 2), where this can be interpreted as a linearizationof the measurement update equation of the KF. This factmight have implications on the estimation of the attitudeangles since an information loop is formed in the scheme.The issue will be discussed later in this paragraph.

    5.3. Point-Mass Filter. The point-mass filter (PMF) is used tofuse measurements coming from the visual odometry systemand the image matching system. The PMF computes thesolution to the Bayesian filtering problem on a discretizedgrid [27]. In [2] such a technique was applied to a terrainnavigation problem where a digital terrain elevation modelwas used instead of a digital 2D image as in the casepresented here. The PMF is particularly suitable for this kindof problem since it handles general probability distributionsand nonlinear models.

    The problem can be represented with the following state-space model:

    �xt = �xt−1 + �u odomt−1 +�v, (21)

    p(�yt | �xt,�zt

    ). (22)

  • 8 EURASIP Journal on Advances in Signal Processing

    Equation (21) represents the process model where �x is thetwo-dimensional state (north-east), �v is the process noise,and �uodom is the position displacement between time t −1 and t computed from the visual odometry (�uodom wasindicated with�t in Section 3). For the process noise �v, a zeromean Gaussian distribution and a standard deviation valueσ = 2 m are used. Such value has been calculated througha statistical analysis of the odometry through Monte Carlosimulations not reported in this paper.

    The observation model is represented by the likelihood(22) and represents the probability of measuring �yt giventhe state vectors �xt and �zt. The latter is a parameter vectorgiven by �zt = (φ̂ θ̂ ψ̂ d). The first three components arethe estimated attitude angles from the Kalman filter while dis the ground altitude measured from the barometric sensor.A certainty equivalence approximation has been used heresince the components of vector �zt are taken as deterministicvalues. The measurement likelihood (22) is computed fromthe cross-correlation (12), and its distribution is non-Gaussian. The distribution given by the cross-correlation(12) represents the correlation of the on-board camera viewwith the reference image. In general, there is an offsetbetween the position matched on the reference image andthe UAV position due to the camera view angle. The offsetmust be removed in order to use the cross-correlation asprobability distribution for the UAV position. The attitudeangles and ground altitude are used for this purpose.

    As discussed earlier, in the PMF approximation thecontinuous state-space is discretized over a two-dimensionallimited size grid, and so the integrals are replaced with finitesums over the grid points. The grid used in this applicationis uniform with N number of points and resolution δ. TheBayesian filtering recursion (14)–(18) can be approximatedas follows:

    p(�xt(k) | �y1:t−1

    ) =N∑n=1

    pv(�xt(k)− �u odomt−1 −�xt−1(n)

    )

    ×p(�xt−1(n) | �y1:t−1)δ2,(23)

    αt =N∑n=1

    p(�yt(n) | �xt(n),�zt

    )p(�xt(n) | �y1:t−1

    )δ2, (24)

    p(�xt(k)|�y1:t

    )=α−1t p(�yt(n)|�xt(n),�zt)p(�xt(k)|�y1:t−1),(25)

    �̂xMV

    t =N∑n=1�xt(n)p

    (�xt(n) | �y1:t

    )δ2, (26)

    Pt=N∑n=1

    (�xt(n)−�̂x

    MV

    t

    )(�xt(n)−�̂x

    MV

    t

    )Tp(�xt(n)|�y1:t

    )δ2.

    (27)

    Before computing the time update (23), the grid points aretranslated according to the displacement calculated from

    the odometry. The convolution (23) requires N2 operationsand computationally is the most expensive operation of therecursion. In any case, due to the problem structure, theconvolution has separable kernels. For this class of two-dimensional convolutions there exist efficient algorithmswhich reduce the computational load. More details on thisproblem can be found in [2].

    Figure 6 depicts the evolution of the filter probabilitydensity function (PDF). From the prior (t0), the PDF evolvesdeveloping several peaks. The images and flight data usedwere collected during a flight-test campaign, and for thecalculation a 200× 200 grid of 1 meter resolution was used.

    The interface between the KF and the PMF is realizedby passing the latitude and longitude values computed fromthe PMF to the KF measurement equation (20). The PMFposition covariance is computed with (27), and it could beused in the KF for the measurement update. However, thechoice of an empirically determined covariance for the KFmeasurement update has been preferred. The constant valuesof 2 meters for the horizontal position uncertainty and 4meters for the vertical position uncertainty (derived from thebarometric altitude sensor) have been used.

    5.4. Stability Analysis from Monte Carlo Simulations. Inthis section the implications of using the attitude anglesestimated by the KF to compute the vision measurementswill be analyzed. As mentioned before, since the visionmeasurements are used to update the KF, an informationloop is created which could limit the performance andstability of the filtering scheme. First, the issue will beanalyzed using a simplified dynamic model. Then, thecomplete KF in close loop with the odometry will be testedusing Monte Carlo simulations. The velocity error dynamicmodel implemented in the KF is derived from a perturbationof the following velocity dynamic model [28]:

    �̇vn = Rnb�ab −

    (2�ωnie + �ωnen

    )×�v n + �g n (28)with �ab representing the accelerometers output, �ωnie and �ωnenthe Earth rotation rate and the navigation rotation rateswhich are negligible, as discussed in Section 2, and �g n thegravity vector. Let us analyze the problem for the 1D case.The simplified velocity dynamics for the xn axis is

    v̇nx = abx cos θ + abz sin θ, (29)

    where θ represents the pitch angle. Suppose that thehelicopter is in hovering but the estimated pitch angle beginsto drift due to the gyro error. An apparent velocity willbe observed due to the use of the attitude angles in themeasurement equation from the vision system. Linearizingthe model around the hovering condition gives for thevelocity dynamics and observation equations:

    v̇nx = abx + abzθ, (30)

    hθ̇ = vnx , (31)

    where h is the ground altitude measured by the barometricsensor. The coupling between the measurement equation

  • EURASIP Journal on Advances in Signal Processing 9

    t0

    10060

    20−20−60

    −10010060

    20−20

    −60−100

    01

    234567

    8×10−4

    (a)

    t0 + 1

    10060

    20−20−60

    −10010060

    20−20

    −60−100

    01

    234567

    8×10−4

    (b)

    t0 + 4

    10060

    20−20−60−100100

    6020

    −20−60

    −10001

    234567

    8×10−4

    (c)

    t0 + 8

    10060

    20−20−60−10010060

    20−20

    −60−100

    01

    234567

    8×10−4

    (d)

    Figure 6: Evolution of the filter’s probability density function (PDF). The capability of the filter to maintain the full probability distributionover the grid space can be observed.

    (31) with the dynamic equation (30) can be observed (θ is infact used to disambiguate the translation from the rotation inthe measurement equation). Substituting (31) in (30) gives

    hθ̈ = abx + abzθ. (32)Equation (32) is a second-order differential equation similarto the equation for a spring-mass dynamic system (abz hasa negative value) where the altitude h plays the role ofthe mass. One should expect that the error introduced bythe information loop has an oscillatory behavior whichchanges with the change in altitude. Since this is an extremelysimplified representation of the problem, the behavior of theKalman filter in close loop with the odometry was testedthrough Monte Carlo simulations.

    Gaussian-distributed accelerometer and gyro data weregenerated together with randomly distributed features inthe images. Since the analysis has the purpose of findingthe effects of the attitude coupling problem, the followingprocedure was applied. First, Kalman filter results wereobtained using the noisy inertial data and constant positionmeasurement update. Then, the KF was fed with the sameinertial data as the previous case but the measurement

    update came from the odometry instead (the features in theimage were not corrupted by noise since the purpose was toisolate the coupling effect).

    The KF results for the first case (constant update) wereconsidered as the reference, and then the results for thesecond case were compared to the reference one. What isshown in the plots is the difference of the estimated pitchangle (Figure 7) and roll angle (Figure 8) between the twocases at different altitudes (obviously the altitude influencesonly the results of the closed loop case).

    For altitudes up to 100 meters the difference between thetwo filter configurations is less than 0.5 degree. However,the increasing of the flight altitude makes the oscillatorybehavior of the system more evident with an increasingin amplitude (as expected from the previous simplifiedanalysis). A divergent oscillatory behavior was observed froman altitude of about 700 meters.

    This analysis shows that the updating method used inthis work introduces an oscillatory dynamics in the filter stateestimation. The effect of such dynamics has a low impact foraltitudes below 100 meters while becomes more severe forlarger altitudes.

  • 10 EURASIP Journal on Advances in Signal Processing

    1009080706050403020100

    (Seconds)

    Alt = 30 mAlt = 60 mAlt = 90 m

    −0.4−0.2

    00.20.4

    (Deg

    rees

    )

    Pitch angle difference

    (a)

    1009080706050403020100

    (Seconds)

    Alt = 200 mAlt = 300 mAlt = 500 m

    −5

    0

    5

    (Deg

    rees

    )

    Pitch angle difference

    (b)

    Figure 7: Difference between the pitch angle estimated by theKF with a constant position update and the KF updated usingodometry position. It can be noticed how the increasing of the flightaltitude makes the oscillatory behavior of the system more evidentwith an increasing in amplitude.

    6. UAV Platform

    The proposed filter architecture has been tested on real flight-test data and on-board autonomous UAV helicopter. Thehelicopter is based on a commercial Yamaha Rmax UAVhelicopter (Figure 1). The total helicopter length is about3.6 m. It is powered by a 21 hp two-stroke engine, andit has a maximum take-off weight of 95 kg. The avionicsystem was developed at the Department of Computer andInformation Science at Linköping University and has beenintegrated in the Rmax helicopter. The platform developedis capable of fully autonomous flight from take-off tolanding. The sensors used in this work consist of an inertialmeasurement unit (three accelerometers and three gyros)which provides the helicopter’s acceleration and angular ratealong the three body axes, a barometric altitude sensor, anda monocular CCD video camera mounted on a pan/tilt unit.The avionic system is based on 3 embedded computers. Theprimary flight computer is a PC104 PentiumIII 700 MHz. Itimplements the low-level control system which includes thecontrol modes (take-off, hovering, path following, landing,etc.), sensor data acquisition, and the communication withthe helicopter platform. The second computer, also a PC104PentiumIII 700 MHz, implements the image processingfunctionalities and controls the camera pan-tilt unit. Thethird computer, a PC104 Pentium-M 1.4 GHz, implementshigh-level functionalities such as path-planning and task-planning. Network communication between computers is

    1009080706050403020100

    (Seconds)

    Alt = 30 mAlt = 60 mAlt = 90 m

    −0.4−0.2

    00.20.4

    (Deg

    rees

    )

    Roll angle difference

    (a)

    1009080706050403020100

    (Seconds)

    Alt = 200 mAlt = 300 mAlt = 500 m

    −50

    5

    (Deg

    rees

    )

    Roll angle difference

    (b)

    Figure 8: Difference between the roll angle estimated by theKF with a constant position update and the KF updated usingodometry position. It can be noticed how the increasing of the flightaltitude makes the oscillatory behavior of the system more evidentwith an increasing in amplitude.

    physically realized with serial line RS232C and Ethernet.Ethernet is mainly used for remote login and file transfer,while serial lines are used for hard real-time network-ing.

    7. Experimental Results

    In this section the performance of the vision-based stateestimation approach described will be analyzed. Experimen-tal evaluations based on offline real flight data as well asonline on-board test results are presented. The flight-testswere performed in an emergency services training area in thesouth of Sweden.

    The reference image of the area used for this experimentis an orthorectified aerial image of 1 meter/pixel resolutionwith a submeter position accuracy. The video camera sensoris a standard CCD analog camera with approximately 45degrees horizontal angle of view. The camera frame rateis 25 Hz, and the images are reduced to half resolution(384 × 288 pixels) at the beginning of the image processingpipeline to reduce the computational burden. During theexperiments, the video camera was looking downward andfixed with the helicopter body. The PMF recursion wascomputed in all experiments on a 80× 80 meters grid of onemeter resolution. The IMU used is provided by the YamahaMotor Company and integrated in the Rmax platform.Table 1 provides available specification of the sensors used inthe experiment.

  • EURASIP Journal on Advances in Signal Processing 11

    Table 1: Available characteristics of the sensor used in the naviga-tion algorithm.

    Sensor Output rate Resolution Bias

    Accelerometers 66 Hz 1 mG 13 mg

    Gyros 200 Hz 0.1 deg/s

  • 12 EURASIP Journal on Advances in Signal Processing

    PMFOdometerGPS

    3

    2

    4

    51

    (a)

    PMFPMF uncertainty

    3

    2

    5 1

    4

    (b)

    Figure 9: (a) displays the comparison between the flight path computed with the point-mass filter (red), the odometry (dashed white), andthe GPS reference (blue). (b) shows the PMF result with the position uncertainty squared represented with ellipses along the path.

    1900180017001600

    (Seconds)

    Vision-based Kalman filterINS/GPS

    −5

    0

    5

    (m/s

    )

    Velocity north

    (a)

    1900180017001600

    (Seconds)

    Vision-based Kalman filterINS/GPS

    −5

    0

    5

    (m/s

    )

    Velocity east

    (b)

    1900180017001600

    (Seconds)

    PMFOdometerStand alone INS

    0

    5

    10

    15

    20

    25

    30

    (Met

    ers)

    Horizontal position error magnitude

    (c)

    1900180017001600

    (Seconds)

    0

    0.5

    1

    1.5

    2

    2.5

    3

    (m/s

    )

    Horizontal velocity error magnitude

    (d)

    Figure 10: (a) and (b) depict the north and east velocity components estimated by the Kalman filter while (d) depicts the horizontal velocityerror magnitude. In (c) the position error comparison between PMF, odometry, and stand alone INS is given.

  • EURASIP Journal on Advances in Signal Processing 13

    1900180017001600

    (Seconds)

    −10

    −5

    0

    5

    10

    (deg

    )Roll

    (a)

    1900180017001600

    (Seconds)

    −5

    0

    5

    (deg

    )

    Roll error

    (b)

    1900180017001600

    (Seconds)

    −10

    −5

    0

    5

    10

    (deg

    )

    Pitch

    (c)

    1900180017001600

    (Seconds)

    −5

    0

    5

    (deg

    )

    Pitch error

    (d)

    1900180017001600

    (Seconds)

    Vision-based Kalman filterINS/GPS

    −100

    0

    100

    (deg

    )

    Heading

    (e)

    1900180017001600

    (Seconds)

    −5

    0

    5

    (deg

    )

    Heading error

    (f)

    Figure 11: Estimated Kalman filter attitude angles with error plots.

    Figures 13 and 14 show the results based on the sameflight data set, but the vision-based navigation systemhas been initialized with 5 degrees error in pitch, roll,and heading angles. It can be observed in Figure 14 howthe pitch and roll converge rapidly to the right estimate(confirming the assumptions and results demonstrated inSection 5) while the heading shows a tendency to convergein the long term, but practically the navigation algorithmwas between 5 and 10 degrees off-heading during thewhole test. It is interesting to notice from Figure 13 theeffects of the off-heading condition. The odometry isrotated by approximately the heading angle error. ThePFM solution is degraded due to the increased odometryerror and due to the fact that there is an image mis-alignment error between the sensed and reference images.Despite the image misalignment, the PMF position estimatecloses the loop at the right location (point 5) thanksto the robustness of the image cross-correlation withrespect to small image misalignment. It appears that the

    image correlation was especially robust around corner 4of the path. This fact is verified by the sudden decreaseof the position uncertainty which can be observed inFigure 13.

    7.2. Real-Time on-Board Flight-Test Results. Flight-testresults of the vision-based state estimation algorithm imple-mented on-board the Rmax helicopter platform will bepresented here. The complete navigation architecture isimplemented on two on-board computers in the C language.The Subsystem 1 (Figure 2) is implemented on the primaryflight computer PFC (PC104 PentiumIII 700 MHz) and runsat a 50 Hz rate. The Subsystem 2 is implemented on theimage processing computer IPC (also a PC104 PentiumIII700 MHz) and runs at about 7 Hz rate. The image processingis implemented using the Intel OpenCV library. The real-time data communication between the two computers isrealized using a serial line RS232C. The data sent from

  • 14 EURASIP Journal on Advances in Signal Processing

    1900180017001600

    (Seconds)

    Odometer without RANSACOdometer with RANSAC

    0

    10

    20

    30

    (Met

    ers)

    Odometer error (max 50 features tracked)

    (a)

    1900180017001600

    (Seconds)

    Odometer without RANSACOdometer with RANSAC

    0

    10

    20

    30

    (Met

    ers)

    Odometer error (max 150 features tracked)

    (b)

    1900180017001600

    (Seconds)

    Features trackedFeatures after RANSAC

    0

    20

    40

    (Met

    ers)

    Features number

    (c)

    1900180017001600

    (Seconds)

    Features trackedFeatures after RANSAC

    0

    50

    100

    150

    (Met

    ers)

    Features number

    (d)

    1900180017001600

    (Seconds)

    0

    20

    40

    60

    (Met

    ers)

    Number of features discarded from RANSAC

    (e)

    1900180017001600

    (Seconds)

    0

    20

    40

    60

    (Met

    ers)

    Number of features discarded from RANSAC

    (f)

    Figure 12: Comparison between odometry calculation using maximum 50 features (left column) and maximum 150 features (right column).The comparison also shows the effects of the RANSAC algorithm on the odometry error.

    2

    3

    5 1

    4PMFOdometerGPS

    51

    4

    3

    2

    PMFPMF uncertainty

    Figure 13: Vision-based navigation algorithm results in off-heading error conditions. The navigation filter is initialized with a 5-degree errorin pitch, roll, and heading. While the pitch and roll angle converge to the right estimate rapidly, the heading does not converge to the rightestimate affecting the odometry and the image cross-correlation performed in off-heading conditions.

  • EURASIP Journal on Advances in Signal Processing 15

    1900180017001600

    (Seconds)

    −10−5

    0

    5

    10

    (deg

    )Roll

    (a)

    1900180017001600

    (Seconds)

    −5

    0

    5

    (deg

    )

    Roll error

    (b)

    1900180017001600

    (Seconds)

    −10−5

    0

    5

    10

    (deg

    )

    Pitch

    (c)

    1900180017001600

    (Seconds)

    −5

    0

    5

    (deg

    )

    Pitch error

    (d)

    1900180017001600

    (Seconds)

    Vision-based Kalman filterINS/GPS

    −100

    0

    100

    (deg

    )

    Heading

    (e)

    1900180017001600

    (Seconds)

    −10

    0

    10

    (deg

    )Heading error

    (f)

    Figure 14: Estimated Kalman filter attitude angles with an initial 5-degree error added for roll, pitch, and heading angles. It can be observedhow the pitch and roll angles converge quite rapidly, while the heading shows a weak converge tendency.

    1

    2

    4

    3

    PMFGPS

    Figure 15: Real-time on-board position estimation results. Com-parison between the flight path computed with the PMF (red) andthe GPS reference (blue).

    the PFC to the IPC is the full vision-based KF state whilethe data sent from the IPC to the PFC is the latitude andlongitude position estimated by the PMF and used to updatethe KF.

    The on-board flight-test results are displayed in Figures15, 16, and 17. The odometry ran with a maximum numberof 50 features tracked in the image without RANSAC as it wasnot implemented on-board at the time of the experiment.The helicopter was manually put in hovering mode at analtitude of about 55 meters above the ground (position 1of Figure 15). Then the vision-based navigation algorithmwas initialized from the ground station. Subsequently thehelicopter from manual was switched into autonomousflight with the control system taking the helicopter statefrom the vision-based KF. The helicopter was commandedto flight from position 1 to position 2 along a straightline path. Observe that during the path segment 1-2, thevision-based solution (red line) resembles a straight lineas the control system was controlling using the vision-based data. The real path taken by the helicopter is instead

  • 16 EURASIP Journal on Advances in Signal Processing

    140012001000800

    (Seconds)

    Vision-based Kalman filterINS/GPS

    −8−6−4−2

    0

    2

    4

    6

    8

    (m/s

    )Velocity north

    (a)

    140012001000800

    (Seconds)

    Vision-based Kalman filterINS/GPS

    −8−6−4−2

    0

    2

    4

    6

    8

    (m/s

    )

    Velocity east

    (b)

    140012001000800

    (Seconds)

    PMFOdometer

    0

    20

    40

    60

    80

    (Met

    ers)

    Horizontal position error magnitude

    (c)

    140012001000800

    (Seconds)

    0

    1

    2

    3

    4

    5

    (m/s

    )

    Horizontal velocity error magnitude

    (d)

    Figure 16: Real-time on-board results. (a) and (b) show the north and east velocity components estimated by the Kalman filter while (d)displays the horizontal velocity error magnitude. In (c) the position error comparison between PMF and odometry is given.

    the blue one (GPS reference). The helicopter was thencommanded to fly a spline path (segment 2-3). With thehelicopter at the hovering point 3 the autonomous flight wasaborted, and the helicopter was taken into manual flight.The reason the autonomous flight was aborted was that inorder for the helicopter to exit the hovering condition andenter a new path segment the hovering stable conditionmust be reached. This is a safety check programmed ina low-level state machine which coordinates the flightmode switching. The hovering stable condition is fulfilledwhen the helicopter speed is less than 1 m/s. As can beseen from the velocity error plot in Figure 16 this is arather strict requirement which is at the border of thevision-based velocity estimation accuracy. In any case, thevision-based algorithm was left running on-board whilethe helicopter was flown manually until position 4. Thevision-based solution was always rather close to the GPSposition during the whole path. Even the on-board solutionconfirms what was observed during the offline tests forthe attitude angles (Figure 17) with a stable pitch and rollestimate and a nonstable heading estimate. Considering

    the error in the heading estimate the PMF position results(Figure 15) confirm a certain degree of robustness of theimage matching algorithm with respect to the image mis-alignment error.

    8. Conclusions and FutureWork

    The experimental results of this work confirm the validityof the approach proposed. A vision-based sensor fusionarchitecture which can cope with short and long term GPSoutages has been proposed and tested on-board unmannedhelicopter. Since the performance of the terrain matchingalgorithms depends on the structural properties of theterrain, it could be wise to classify beforehand the referenceimages according to certain navigability criteria. In [29–31], methods for selecting scene matching areas suitable forterrain navigation are proposed.

    In the future, a compass will be used as an externalheading aid to the KF which should solve the headingestimation problem encountered here.

  • EURASIP Journal on Advances in Signal Processing 17

    140012001000800

    (Seconds)

    −10

    0

    10

    (deg

    )Roll

    (a)

    140012001000800

    (Seconds)

    −5

    0

    5

    (deg

    )

    Roll error

    (b)

    140012001000800

    (Seconds)

    −10

    0

    10

    (deg

    )

    Pitch

    (c)

    140012001000800

    (Seconds)

    −5

    0

    5

    (deg

    )

    Pitch error

    (d)

    140012001000800

    (Seconds)

    Vision-based Kalman filterINS/GPS

    −100

    0

    100

    (deg

    )

    Heading

    (e)

    140012001000800

    (Seconds)

    −10

    −5

    0

    5

    10

    (deg

    )

    Heading error

    (f)

    Figure 17: Real-time on-board results. Estimated Kalman filter attitude angles with error plots.

    Acknowledgment

    The authors would like to acknowledge their colleagues PieroPetitti, Mariusz Wzorek, and Piotr Rudol for the supportgiven to this work. This work is supported in part by theNational Aeronautics Research Program NFFP04-S4202, theSwedish Foundation for Strategic Research (SSF) StrategicResearch Center MOVIII, the Swedish Research CouncilLinnaeus Center CADICS and LinkLab - Center for FutureAviation Systems.

    References

    [1] C. James, et al., “Vulnerability assessment of the transporta-tion infrastructure relying on the global positioning system,”Tech. Rep., Volpe National Transportation Systems Center, USDepartment of Transportation, August 2001.

    [2] N. Bergman, Recursive Bayesian estimation, navigation andtracking applications, Ph.D. dissertation, Department of Elec-trical Engineering, Linköping University, Linköping, Sweden,1999.

    [3] C. Tomasi and T. Kanade, “Detection and tracking of pointfeatures,” Tech. Rep. CMU-CS-91-132, Carnegie Mellon Uni-versity, Pittsburgh, Pa, USA, April 1991.

    [4] D.-G. Sim, R.-H. Park, R.-C. Kim, S. U. Lee, and I.-C. Kim,“Integrated position estimation using aerial image sequences,”IEEE Transactions on Pattern Analysis andMachine Intelligence,vol. 24, no. 1, pp. 1–18, 2002.

    [5] R. Karlsson, T. B. Schön, D. Törnqvist, G. Conte, and F.Gustafsson, “Utilizing model structure for efficient simulta-neous localization and mapping for a UAV application,” inProceedings of the IEEE Aerospace Conference, Big Sky, Mont,USA, March 2008.

    [6] J. Kim and S. Sukkarieh, “Real-time implementation ofairborne inertial-SLAM,” Robotics and Autonomous Systems,vol. 55, no. 1, pp. 62–71, 2007.

    [7] T. Lemaire, C. Berger, I.-K. Jung, and S. Lacroix, “Vision-based SLAM: stereo and monocular approaches,” InternationalJournal of Computer Vision, vol. 74, no. 3, pp. 343–364, 2007.

    [8] A. E. Johnson and J. F. Montgomery, “Overview of terrainrelative navigation approaches for precise lunar landing,” inProceedings of the IEEE Aerospace Conference, Big Sky, Mont,USA, March 2008.

  • 18 EURASIP Journal on Advances in Signal Processing

    [9] P. Maybeck, Stochastic Models, Estimation and Control: Volume1, Academic Press, New York, NY, USA, 1979.

    [10] M. Svensson, Aircraft trajectory restoration by integrationof inertial measurements and GPS, M.S. thesis, LinköpingUniversity, Linköping, Sweden, 1999.

    [11] E.-H. Shin, Accuracy improvement of low cost INS/GPS forland applications, M.S. thesis, University of Calgary, Calgary,Canada, 2001.

    [12] W. K. Pratt, Digital Image Processing, John Wiley & Sons, NewYork, NY, USA, 2nd edition, 1991.

    [13] O. Amidi, An autonomous vision-guided helicopter, Ph.D.dissertation, Carnegie Mellon University, Pittsburgh, Pa, USA,1996.

    [14] N. Frietsch, O. Meister, C. Schlaile, J. Seibold, and G. Trommer,“Vision based hovering and landing system for a vtol-mav withgeo-localization capabilities,” in AIAA Guidance, Navigationand Control Conference and Exhibit, Honolulu, Hawaii, USA,August 2008.

    [15] S. S. Mehta, W. E. Dixon, D. MacArthur, and C. D. Crane,“Visual servo control of an unmanned ground vehicle via amoving airborne monocular camera,” in Proceedings of theAmerican Control Conference, pp. 5276–5281, Minneapolis,Minn, USA, June 2006.

    [16] F. Caballero, L. Merino, J. Ferruz, and A. Ollero, “A visualodometer without 3D reconstruction for aerial vehicles.Applications to building inspection,” in Proceedings of IEEEInternational Conference on Robotics and Automation, pp.4673–4678, Barcelona, Spain, 2005.

    [17] R. Hartley and A. Zisserman, Multiple View Geometry, Cam-bridge University Press, Cambridge, UK, 2nd edition, 2003.

    [18] M. A. Fischler and R. C. Bolles, “Random sample consensus: aparadigm for model fitting with applications to image analysisand automated cartography,” Communications of the ACM,vol. 24, no. 6, pp. 381–395, 1981.

    [19] J. Shi and C. Tomasi, “Good features to track,” in roceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR ’94), pp. 593–600, Seattle, Wash,USA, June 1994.

    [20] E. Michaelsen, M. Kirchhoff, and U. Stilla, “Sensor poseinference from airborne videos by decomposing homographyestimates,” in International Archives of Photogrammetry andRemote Sensing, M. O. Altan, Ed., vol. 35, pp. 1–6, 2004.

    [21] D. B. Barber, J. D. Redding, T. W. McLain, R. W. Beard, andC. N. Taylor, “Vision-based target geo-location using a fixed-wing miniature air vehicle,” Journal of Intelligent and RoboticSystems, vol. 47, no. 4, pp. 361–382, 2006.

    [22] L. G. Brown, “A survey of image registration techniques,”ACMComputing Surveys, no. 24, pp. 326–376, 1992.

    [23] B. Zitova and J. Flusser, “Image registration methods: asurvey,” Image and Vision Computing, vol. 21, no. 11, pp. 977–1000, 2003.

    [24] D. G. Lowe, “Distinctive image features from scale-invariantkeypoints,” International Journal of Computer Vision, vol. 60,no. 2, pp. 91–110, 2004.

    [25] N. Trawny, A. I. Mourikis, S. I. Roumeliotis, et al., “Coupledvision and inertial navigation for pin-point landing,” in NASAScience and Technology Conference, 2007.

    [26] K. R. Britting, Inertial Navigation Systems Analysis, John Wiley& Sons, New York, NY, USA, 1971.

    [27] R. S. Bucy and K. D. Senne, “Digital synthesis of nonlinearfilters,” Automatica, no. 7, pp. 287–298, 1971.

    [28] R. M. Rogers, Applied Mathematics in Integrated NavigationSystems, American Institute of Aeronautics & Astronautics,Reston, Va, USA, 2000.

    [29] G. Zhang and L. Shen, “Rule-based expert system for selectingscene matching area,” in Intelligent Control and Automation,vol. 344, pp. 546–553, Springer, Berlin, Germany, 2006.

    [30] G. Cao, X. Yang, and S. Chen, “Robust matching areaselection for terrain matching using level set method,” inProceedings of the 2nd International Conference on ImageAnalysis and Recognition (ICIAR ’05), vol. 3656 of LectureNotes in Computer Science, pp. 423–430, Toronto, Canada,September 2005.

    [31] E. L. Hall, D. L. Davies, and M. E. Casey, “The selection ofcritical subsets for signal, image, and scene matching,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol.2, no. 4, pp. 313–322, 1980.

    1. Introduction2. Sensor Fusion Architecture3. Visual Odometry4. Image Registration5. Sensor Fusion Algorithms5.1. Bayesian Filtering Recursion5.2. Kalman Filter5.3. Point-Mass Filter5.4. Stability Analysis from Monte Carlo Simulations

    6. UAV Platform7. Experimental Results7.1. Performance Evaluation Using Offline Flight Data7.2. Real-Time on-Board Flight-Test Results

    8. Conclusions and Future WorkAcknowledgmentReferences