Control and estimation algorithms for the …Control and estimation algorithms for the stabilization of VTOL UAVs from mono-camera measurements H. de Plinval1, A. Eudes 2, P. Morin

Control and estimation algorithms for the stabilization of VTOL

UAVs from mono-camera measurements

H. de Plinval1, A. Eudes2, P. Morin2

1 ONERA-DCSD, Toulouse, France, henry.de [email protected] Institut des Systemes Intelligents et de Robotique, Universite Pierre et Marie Curie

CNRS, UMR 7222Paris, France, [email protected]

June 17, 2014

Abstract

This paper concerns the control of Vertical Take-Off and Landing (VTOL) Unmanned Aerial Vechicles(UAVs) based on exteroceptive measurements obtained from a mono-camera vision system. By assumingthe existence of a locally planar structure in the field of view of the UAV’s camera, the so-called homographymatrix can be used to represent the vehicle’s motion between two views of the structure. In this paper wereport recent results on both the problem of homography estimation from the fusion of visual and inertialdata, and the problem of feedback stabilization of VTOL UAVs from homography measurements.

1 Introduction

Obtaining a precise estimation of the vehicle’s posi-tion is a major issue in aerial robotics. The GPSis a very popular sensor in this context and it hasbeen used extensively with VTOL UAVs, expeciallyfor navigation via waypoints. Despite recent progressof this technology, especially in term of precision,many applications cannot be addressed with the GPSas unique position sensor. First, GPS is not avail-able indoor and it can also be masked in some out-door environments. Then, most inspection applica-tions require a relative localization with respect to(w.r.t.) the environment, rather than an absolutelocalization as provided by the GPS. Finally, evolv-ing in dynamic environments also requires relativelocalization capabilities. For all these reasons, it isimportant to develop control strategies based on ex-teroceptive sensors that can provide a relative posi-tion information w.r.t the local environement. Exam-ples of such sensors are provided by cameras, lasers,

radars, etc. Cameras are interesting sensors to usewith small UAVs because they are light, low cost,and provide a rich information about the environ-ment at a relatively high frequency. A precise 3Drelative position information is best obtained from astereo vision system with a ”long” baseline (i.e. inter-distance between the optical centers of the cameras).In this case, available feedback controllers that re-quire position errors as inputs can be used. Usinga mono-camera system is more challenging becausethe depth-information cannot be recovered instanta-neously (i.e., based on a single measurement). Nev-ertheless, a mono-camera system can be preferred insome applications due to its compacity, or becausethe distance between the camera and the environ-ment is large so that even a stereo-system would pro-vide a poor depth-information.

This paper concerns the control of VTOL UAVsfrom mono-camera measurements. We assume theexistence of a locally planar structure in the environe-

1

ment. This assumption is restrictive but it is relevantin practice because i) many man-made buildings arelocally planar, and ii) when the distance between thecamera and the environment is large, the planarityassumption can be satisfied locally in first approxi-mation despite the environment not being perfectlyplanar (e.g. as in the case of ground observation atrelatively high altitude). Based on two camera viewsof this planar structure, it is well known in computervision that one can compute the so-called homographymatrix, which embeds all the displacement informa-tion between these two views [15]. This matrix canbe estimated without any specific knowledge on theplanar structure (like its size or orientation). There-fore, it is suitable for the control of UAVs operatingin unknown environments. Homography-based stabi-lization of VTOL UAVs raises two important issues.The first one is the estimation of the homography ma-trix itself. Several algorithms have been developed inthe computer vision community to obtain such anestimation (see, e.g., [15, 1]). Recently, IMU-aidedfusion algorithms have been proposed to cope withnoise and robustness limitations associated with ho-mography estimation algorithms based on vision dataonly [16, 9]. The second issue concerns the designof stabilizing feedback laws. The homography asso-ciated with two views of a planar scene is directlyrelated to the cartesian displacement (in both posi-tion and orientation) between these two views butthis relation depends on unknown parameters (nor-mal and distance to the scene). Such uncertaintiessignificantly complicate the design and stability anal-ysis of feedback controllers. This is all the more truethat VTOL UAVs are usually underactuated systems,with high-order dynamic relations between the vehi-cle’s position and the control input For example, hor-izontal displacement is related to roll and pitch con-trol torque via fourth-order systems. For this reason,most existing control strategies based on homogra-phy measurements make additional assumptions onthe environment, i.e. the knowledge of the normalto the planar scene [20, 21, 18, 14]. This simplifiesthe control design and stability analysis since in thiscase, the vehicle’s cartesian displacement (rotationand position up to an unknown scale factor) can beextracted from the homography measurement.

This paper reports recent results by the authorsand co-authors on both the problem of homographyestimation via the fusion of inertial and vision data[16, 9], and the design of feedback controllers basedon homography measurements [5, 7]. The paper isorganized as follows. Preliminary background andnotation are given in section 2. Feedback control al-gorithms are presented in section 3 and homographyestimation algorithms in section 4. Finally, some im-plementation issues are discussed in section 5.

2 Background

In this section we review background on both thedynamics of VTOL UAVs and the homography ma-trix associated with two camera images of a planarscene. Let us start by defining the control problemaddressed in this paper.

2.1 Control problem

Figure 1 illustrates the visual servoing problem ad-dressed in this paper. A VTOL UAV is equippedwith of a mono-camera. A reference image of a planarscene T , which was obtained with the UAV locatedat a reference frame R∗, is available. From this ref-erence image and the current image, obtained fromthe current UAV location (frame R), the objective isto design a control law that can asymptotically sta-bilize R to R∗. Note that asymptotic stabilization ispossible only if R∗ corresponds to a possible equilib-rium, i.e., in the absence of wind the thrust directionassociated with R∗ must be vertical.

2.2 Dynamics of VTOL UAVs

We consider the class of thrust-propelled underac-tuated vehicles consisting of rigid bodies moving in3D-space under the action of one body-fixed forcecontrol and full torque actuation [13]. This class con-tains most VTOL UAVs (quadrotors, ducted fans,helicopters, etc). Being essentially interested here inhovering stabilization, throughout the paper we ne-glect aerodynamic forces acting on the vehicle’s main

2

body. Assuming that R∗ is a NED (North-East-Down) frame (See Fig. 1), the dynamics of these sys-tems is described by the following well-known equa-tions:

mp = −TRb3 +mgb3R = RS(ω)Jω = Jω × ω + Γ

(1)

with p the position vector of the vehicle’s center ofmass, expressed in R∗, R the rotation matrix fromR to R∗, ω the angular velocity vector of R w.r.t.R∗ expressed in R, S(.) the matrix-valued functionassociated with the cross product, i.e. S(x)y = x ×y , ∀x, y ∈ R3, m the mass, T the thrust controlinput, b3 = (0, 0, 1)T , J the inertia matrix, Γ thetorque control input, and g the gravity constant.

,

x

y

z

x

y

z

TTTT

PPPP

Z*d*

*

*

*

Figure 1: Problem scheme

2.3 Homography matrix and monocu-lar vision

With the notation of Figure 1, consider a point P ∈ Tand denote by X∗ the coordinates of this point inR∗.InR∗, the plane T is defined as {X∗ ∈ R3 : n∗TX∗ =d∗} with n∗ the coordinates in R∗ of the unit vectornormal to T and d∗ the distance between the originof R∗ and the plane. Let us now denote as X thecoordinates of P in the current frame R. One has

X∗ = RX + p and therefore,

X = RTX∗ −RT pX = RTX∗ −RT p[ 1

d∗n∗TX∗]

= (RT − 1d∗R

T pn∗T )X∗

= HX∗

(2)

with

H = RT − 1

d∗RT pn∗T (3)

The matrix H could be determined by matching3D-coordinates in the reference and current cameraplanes of points of the planar scene. Camera donot provide these 3D-coordinates, however, since onlythe 2D-projective coordinates of P on the respectiveimage planes are available. More precisely, the 2D-projective coordinates of P in the reference and cur-rent camera planes are respectively given by

µ∗ = KX∗

z∗, µ = K

X

z

where z∗ and z denote the third coordinate of X∗ andX respectively (i.e., the coordinate along the cameraoptical axis), and K is the calibration matrix of thecamera. It follows from (2) and (4) that

µ = Gµ∗ (4)

with

G ∝ KHK−1

where ∝ denotes equality up to a positive scalar fac-tor. The matrix G ∈ R3×3, defined up to a scalefactor, is called uncalibrated homography matrix. Itcan be computed by matching projections onto thereference and current camera planes of points of theplanar scene. If the camera calibration matrix K isknown, then the matrix H can be deduced from G, upto a scale factor, i.e., K−1GK = αH. As a matter offact, the scale factor α corresponds to the mean sin-gular value of the matrix K−1GK: α = σ2(K−1GK)(see, e.g., [15, Pg. 135]). Therefore, α can be com-puted together with the matrix H. Another interest-ing matrix is

H = det(H)−13 H = η H (5)

3

Indeed, det(H) = 1 so that H belongs to the SpecialLinear Group SL(3). We will see further that thisproperty can be exploited for homography filteringand estimation purposes. Let us finally remark thatη3 = d∗

d .

3 Feedback Control Design

We present in this section two classes of feedback con-trol laws for the asymptotic stabilization of VTOLUAVs based on homography measurements of theform H defined by (3). The first class consists of con-trol laws affine w.r.t. the homography matrix com-ponents. These control laws ensure local asymptoticstabilization under very mild assumptions on the ob-served scene. The second class consists of nonlinearcontrol laws that ensure large stability domains understronger assumptions on the scene.

3.1 Linear control laws

The main difficulty in homography-based stabiliza-tion comes from the mixing of position and orienta-tion information in the homography matrix compo-nents, as shown by relation (3). If the normal vec-tor n∗ is known, then one can easily extract from Hthe rotation matrix and the position vector up to thescale factor 1/d∗. When n∗ is unknown, however, thisextraction is no longer possible and one has to dealwith this mixing of information. The control lawshere presented rely on the possibility of extractingfrom H partially decoupled position and rotation in-formation. This is shown by the following result firstproposed in [6].

Proposition 1 Let e = Me with

M =

(2I3 S(m∗)

−S(m∗) I3

), e =

(epeΘ

)(6)

and

ep = (I − H)m∗ , eΘ = vex(HT − H)m∗ = b3 = (0, 0, 1)T

(7)

where vex(.) is the inverse of the S(.) operator:vex(S(x)) = x , ∀x ∈ R3. Let Θ = (φ, θ, ψ)T de-note any parametrization of the rotation matrix R

such that R ≈ I3 + S(Θ) around R = I3 (e.g., Eulerangles). Then,

1. (p,R) 7−→ e defines a local diffeomorphismaround (p,R) = (0, I3). In particular, e = 0if and only if (p,R) = (0, I3).

2. In a neighborhood of (p,R) = (0, I3),

e = L

(pΘ

)+O2(p,Θ), L =

(Lp 0LpΘ LΘ

)(8)

with LpΘ = S((α∗, β∗, 0)T ),

Lp =

c∗ 0 α∗

0 c∗ β∗

0 0 2c∗

, LΘ =

1 0 00 1 00 0 2

,

α∗, β∗ the (unknown) constant scalars defined byn∗ = d∗(α∗, β∗, c∗)T , c∗ = 1

‖X∗‖ , and O2 terms

of order two at least. 4

Eq. (8) shows the rationale behind the definition ofe: at first order, components e1, e2, e3 contain infor-mation on the translation vector p only, while com-ponents e4, e5, e6 contain decoupled information onthe orientation (i.e. LΘ is diagonal), corrupted bycomponents of the translation vector. Although thedecoupling of position and orientation information inthe components of e is not complete, it is sufficientto define asymptotically stabilizing control laws, asshown below.

Let ep ∈ R3 (resp. eΘ ∈ R3) denote the first (resp.last) three components of e, i.e. e = (eTp , e

TΘ)T . The

control design relies on a dynamic extension of thestate vector defined as follows:

ν = −K7ν − ep (9)

with K7 a diagonal gain matrix. The variable ν copeswith the lack of measurements of ˙e. The control de-sign is presented through the following theorem.

Theorem 1 Assume that the target is not verticaland the camera frame is identical with R (as shownon Fig. (1)). Let{

T = m (g + k1e3 + k2ν3)

Γ = −JK3

(ω − ωd

) (10)

4

with {ωd = −K4

g

(geΘ + b3 × γd

)γd = −K5ep −K6ν

(11)

Then,

1. Given any upper-bound c∗M > 0, there exist

diagonal gain matrices Ki = Diag(kji ) i =3, . . . , 7; j = 1, 2, 3 and scalar gains k1, k2, suchthat the control law (10) makes the equilib-rium (p,R, v, ω, ν) = (0, I3, 0, 0, 0) of the closed-loop System (1)-(9)-(10)-(11) locally exponen-tially stable for any value of c∗ ∈ (0, c∗M ].

2. If the diagonal gain matrices Ki and scalar gainsk1, k2 make the closed-loop system locally expo-nentially stable for c∗ = c∗M , then local expo-nential stability is guaranteed for any value ofc∗ ∈ (0, c∗M ]. 4

This result calls for several remarks.

1) The control calculation only requires the knowl-edge of H (via e) and ω. Thus, it can be imple-mented with a very minimal sensor suite consistingof a mono-camera and gyrometers only.

2) This result does not address the case of a verticaltarget. This case can be addressed as well with thesame kind of technique and stability result. Such anextension can be found in [7] together with severalother generalizations of Theorem 1.

3) Since c∗ = 1/‖X∗‖ and ‖X∗‖ ≥ d∗, a sufficientcondition for c∗ ∈ (0, c∗M ] is that d∗ ≥ 1/c∗M . Thus,Property 1) ensures that stabilizing control gains canbe found given any lower bound on the distance be-tween the reference pose and the observed planar tar-get. This is a very weak requirement from an appli-cation point of view. Property 2) is also a very strongresult since it implies that in order to find stabilizingcontrol gains for any c∗ ∈ (0, c∗M ], it is sufficient tofind stabilizing control gains for c∗ = c∗M . This is amuch easier task which can be achieved with classi-cal linear control tools. In particular, by using theRouth-Hurwitz criterion, explicit stability conditionson the control gains can be derived (see [7] for moredetails).

3.2 Nonlinear control laws

Theorem 1 shows that homography-based stabilizingcontrol laws can be designed from very limited a pri-ori information (essentially, a lower bound on the dis-tance to the scene at the desired configuration andthe scene planarity property). A weakness of thisstability result, however, is the lack of knowledge onthe size of the stability domain. Under some assump-tions on the scene orientation, it is possible to derivestabilizing control laws with explicit (and large) sta-bility domains. A first case of interest in practice iswhen the target is horizontal. In this case, the normalvector to the scene is known and the extraction of theorientation and position up to as scale factor, from H,allows one to use available nonlinear control laws withlarge stability domains. Another interesting scenariofor applications is when the target is vertical. Thiscase is more challenging since knowing that the sceneis vertical does not completely specify its orientation.We present below a nonlinear feedback control to ad-dress this case.

First, let us remark that n∗3 = 0 when the sceneis vertical. Indeed, the normal vector to the scene ishorizontal and the reference frame R∗ is associatedwith an equilibrium configuration so that its thirdbasis vector is vertical (pointing downward). Then,it follows from (3) that{

σ := Hb2 × Hb3 − Hb1 = RTM(n∗

d∗ )pγ := gHb3 = gRT b3

(12)

with M(τ) = τ1I3 + S(τ2b3). These relations showthat one can extract from H decoupled informationin term of position and orientation. Compared to theresult given in Proposition 1, this result is strongersince the decoupling is complete and it holds withoutany approximation. On the other hand, it is limitedto a vertical scene. Note that γ corresponds to thecomponents of the gravity vector in body frame. Thisvector, which is used in conventional control schemesbased on cartesian measurements, is typically esti-mated from accelerometers and gyrometers measure-ments of an IMU, assuming small accelerations of theUAV [17].

Eq. (12) leads us to address the asymptotic sta-bilization of UAVs from pose measurements of the

5

form σ = RTMp , γ = gRT b3 where M is an un-known positive definite matrix. We further assumethat the velocity measurements ω and v = RT p arealso available. The variable v can be estimated, e.g.,via optical flow algorithms [10, 11, 9]. In most stud-ies on feedback control of underactuated UAVs, it isassumed that M is the identity matrix, so that therelation between the measurement function and thecartesian coordinates is perfectly known. Several con-trol design methods ensuring semi-global stability ofthe origin of System (1) have been proposed in thiscase (see, e.g., [19, 13]). We show below that similarstability properties can be guaranteed in the case ofuncertainties on the matrix M . To this end, let usintroduce some notation.

For any square matrix M , Ms := M+MT

2 and

Ma := M−MT

2 respectively denote the symmetric andantisymmetric part of M . Given a smooth function fdefined on an open set of R, its derivative is denotedas f ′. Given δ = [δm; δM ] with 0 < δm < δM , weintroduce the saturating function

satδ(τ) =

{1 if τ ≤ δ2

mδM√τ− (δM−δm)2√

τ(√τ+δM−2δm)

if τ > δ2m

(13)Note that τ 7−→ τsatδ(τ

2) defines a classical satu-ration function, in the sense that it is the identityfunction on [0, δm] and it is upper-bounded by δM .

We can now state the main result of this section(See [5] for more details, generalizations, and proof).By a standard time separation argument commonlyused for VTOL UAVs, we assume that the orientationcontrol variable is the angular velocity ω instead ofthe torque Γ (i.e., once a desired angular velocityωd has been defined, a torque control input Γ thatensures convergence of ω to ωd is typically computedthrough a high gain controller).

Theorem 2 Let satδ and satδ denote two saturatingfunctions. Assume that M is positive definite and

consider any gain values k1, k2 > 0 such thatk2

2λmin(Ms) > k1||Ma||||M ||CC , supτ (satδ(τ) + 2τ |sat′δ(τ)|)k2δm > k1

k1 + k2δM < g(14)

Define a dynamic augmentation:

ν = ν × ω − k3(ν − σ) , k3 > 0 (15)

together with the control (T, ω) such that:ω1 = − k4|µ|µ2

(|µ|+µ3)2− 1|µ|2 µ

TS(b1)RT µ

ω2 = k4|µ|µ1

(|µ|+µ3)2− 1|µ|2 µ

TS(b2)RT µ

T = mµ3

(16)

where µ, µ, and the feedforward term RT µ are givenby

µ := γ + k1satδ(|ν|2)ν + k2satδ

(|v|2)v

µ := RµRT µ = −k1k3

[satδ(|ν|2)I3 + 2sat′δ(|ν|2)ννT

](ν − σ)

+k2

[satδ(|v|2)I3 + 2sat′

δ(|v|2)vvT

](γ − ub3)

Then,

i) there exists k3,m > 0 such that, for any k3 >k3,m, the equilibrium (ν, p, p, γ) = (0, 0, 0, gb3) ofthe closed-loop system (1)-(15)-(16) is asymptot-ically stable and locally exponentially stable withconvergence domain given by {(ν, p, p, γ)(0) :µ(0) 6= −|µ(0)|b3}.

ii) if Ms and Ma commute, the same conclusionholds with the first inequality in (14) replaced by:

k22λmin(Ms) > k1‖Ma‖ (‖Ma‖ supτ satδ(τ)+

‖Ms‖ supτ 2τ |sat′δ(τ)|)(17)

Let us comment on the above result. It followsfrom (14) that

|k1satδ(|ν|2)ν+k2satδ

(|v|2)v| ≤ k1+k2δM < g = |γ|

This guarantees that µ(0) 6= −|µ(0)|b3 whenevergbT3 R(0)b3 > −(k1 + k2δM ). As a consequence, the

6

only limitation on the convergence domain concernsthe initial orientation error and there is no limitationon the initial position/velocity errors. Note also thatthe limitation on the initial orientation error is notvery strong. Note that ω3, which controls the yaw dy-namics, is not involved in this objective. Thus, it canbe freely chosen. In practice, however, some choicesare better than others (see below for more details).

Application to the visual servoing problem:From (12), Theorem 2 applies directly with M =

M(n∗

d∗

)=

n∗1

d∗ I3 + S(n∗2

d∗ b3). In this case one verifies

that the stability conditions (14)-(17) are equivalentto the following:

n∗1 > 0k1, k2 > 0k2δm > k1

k1 + k2δM < g

n∗1d∗k2

2 > k1|n∗2|(|n∗2|+

2n∗1

3√

3

) (18)

Note that the first condition, which ensures that Mis positive definite, essentialy means that the camerais ”facing” the target at the reference pose. This is avery natural assumption from an application point ofview. When (loose) bounds are known for d∗: dmin ≤d∗ ≤ dmax and n∗1 ≥ n1min, and recalling that |n∗| =1, the last condition of equation (18) can be replacedby:

n1mindmink22 > k1

(1 +

2

3√

3

)(19)

The yaw degree of freedom is not involved in thestabilization objective. On the other hand, it mat-ters to keep the target inside the field of view of thecamera. We propose to use the following control law:

ω3 = k5H21 (20)

Upon convergence of the position, velocity, roll andpitch angles due to the other controls, the yaw dy-namics will be close to ψ ≈ −k5sinψ, thus ensuringthe convergence of ψ to zero unless ψ is initially equalto π (case contradictory with the visibility assump-tion). Another nice feature of this yaw control is thatit vanishes when H21 = 0, i.e. when the target is seen-from yaw prospective- as it should be at the end

of the control task. This means that the controllertries to reduce the yaw angle only when the posi-tion/velocity errors have been significantly reduced.

4 Homography estimation

Obtaining in real-time a good estimate of the homog-raphy matrix is a key issue for the implementation ofthe stabilization algorithms presented before. In thissection we first briefly review existing computer vi-sion algorithms to obtain an estimate of the homog-raphy matrix. Then, we focus on the use of inertialmeasurements to improve and speed-up the estima-tion process.

4.1 Computer vision methods

There are two main classes of vision algorithms forcomputing the homography matrix between two im-ages of the same planar scene:

1. Interest points based methods

2. Intensity based methods

In the first case, the homography matrix is recov-ered from points correspondence between the two im-ages in a purely geometrical way. A first step con-sists in the detection of interest points. These cor-respondences can be estimate by matching (with in-terest point detection and descriptor) or KLT track-ing (based on intensity). From this correspondencethe homography matrix is recovered with algorithmssuch as DLT [12], which are most of the time coupledwith robust estimation techniques like RANSAC orM-estimator in order to avoid false matching. Formore details on interest points based methods, thereader is also referred to [12].

In the second case, the homography matrix is esti-mated by trying to align two images (the referenceimage or ”template” T and the current image I).This is done, e.g., by defining a transformation (usu-ally called ”warping”) from the reference image tothe current image wρ : q∗ 7−→ q = wρ(q

∗), whereq∗ denotes a pixel in the reference image, q a pixelin the current image, and ρ is a parameterization of

7

the homography matrix, for example a parameteriza-tion of the Lie algebra of SL(3). This definition leadsto an optimization problem which is solved numeri-cally. The problem consists in minimizing w.r.t. ρ ameasure of the distance between the reference imageT = {T (q∗)} and the transform of the image I by thewarping: {I(wρ(q

∗))}. The cost function of the op-timization problem varies with the proposed methodbut most of the time it essentially boil downs to asum over the image’s pixels of the distance betweenthe pixel intensities in the two images. Usually, theoptimization process only provides the optimal solu-tion locally, i.e. provided the distance between thetwo images is small enough. One way to improvethe convergence of this type of method is to rely onGaussian pyramids [4]. In this case, the templateimage is smoothed by a Gaussian and recursivelydown-sampled by a factor two to form a pyramid ofimages, with the template image at the bottom andthe smallest image at the top. The visual method isthen successively applied at each level of the pyra-mid, from top to bottom. Thus, large movements arekept small in pixel space and the convergence domainof the method is improved.

In this paper we focus on two estimation algorithmsof this second class of methods: the ESM algorithm(Efficient Second order Minimization) [3], and theIC algorithm (Inverse Compositional) [2]. Table 5.2summarizes the main features of both methods. Themain interest of the IC method is that it allows one tomake a lot of precomputation based on the referenceimage. Indeed, the Jacobian matrix J of the costfunction is computed from the template image, i.e.it depends neither on the current image nor on thehomography parameterization ρ. Thus, the inverseof JTJ can also be precomputed. For each iteration,only the computation of the intensity error and ma-trix multiplication are needed. By contrast, the ESMis a second order method that uses both the currentimage gradient and template image to find the bestquadratic estimation of the cost function. Therefore,each iteration of the optimization algorithm is longerthan for the IC method. As a counterpart, the con-vergence rate of the method is faster.

4.2 IMU-aided homography estima-tion

Cameras and IMUs are complementary sensors. Inparticular, cameras frame rate is relatively low(around 30Hz) and in addition vision data process-ing can take a significant amount of time, especiallyon small UAVs with limited computation power. Bycontrast, IMUs provide data at high frequency andthis information can be processed quickly. SinceIMUs are always present on UAVs for control pur-poses, it is thus natural to exploit them for improvingthe homography estimation process. We present inthis section nonlinear observers recently proposed in[16] to fuse a vision-based homography estimate withIMU data. This fusion process is made on the Spe-cial Linear Lie Group SL(3) associated with the ho-mography representation (5), i.e. det(H) = 1. Thisallows one to exploit Lie group invariance propertiesin the observer design. We focus on two specific ob-servers.

The first observer considered is based on the gen-eral form of the kinematics on SL(3):

H = −XH (21)

with H ∈ SL(3) and X ∈ sl(3). The observer is givenby

˙H = −AdH

(X − k1P

(H(I3 − H)

))H

˙X = −k2P

(H(I3 − H)

) (22)

with H ∈ SL(3), X ∈ sl(3), H = HH−1. It isshown in [16] that this observer ensures almost globalasymptotic stability of (I3, 0) for the estimation error(H, X) = (HH−1, X − X) (i.e., asymptotic conver-gence of the estimates to the original variables) pro-vided thatX is constant (see [16, Th. 3.2] for details).Although this condition is seldom satisfied in prac-tice, this observer provides a simple solution to theproblem of filtering homography measurements. Fi-nally, note that this observer uses homography mea-surements only.

A second observer, which explicitly takes into ac-count the kinematics of the camera motion, is pro-posed in [16]. With the notation of Section 3, recall

8

that the kinematics of the camera frame is given by{R = RS(ω)

p = Rv(23)

With this notation, one can show that the group ve-locity X in (21) is given by

X = S(ω) +vnT

d− vnT

3dI3

= S(ω) + η3 P(M)

with

Y =vnT

d∗(24)

The following observer of H and Y is proposed in[16]:

˙H = −AdH

(S(ω)+η3P(Y )− k1P

(H(I3 − H)

))H

˙Y = Y S(ω)− k2η

3P(H(I3 − H)

)(25)

with H ∈ SL(3), Y ∈ R3×3 and H = HH−1.Conditions under which the estimates (H, Y ) al-

most globally converge to (H,Y ) are given in [16,Cor. 5.5]. These conditions essentially reduce to thefollowing: i) ω is persistently exciting, and ii) v isconstant. The hypothesis of persistent excitation onthe angular velocity is used to demonstrate the con-vergence of Y to Y . In the case of lack of persis-tent excitation, Y converges only to Y + a(t)I3 witha(t) ∈ R but the convergence of H to H still holds.The hypothesis of v constant is a strong assumption.Asymptotic stability of the observer for v constant,however, guarantees that the observer can provide ac-curate estimates when v is slowly time varying withrespect to the filter dynamics. This will be illustratedlater in the paper and experimentally verified.

4.3 Architecture and data synchro-nization

Implementation of the above observers from IMUand camera data is made via a classical predic-tion/correction estimation scheme. Quality of thisimplementation requires careful handling of data

acquisition and communication. Synchronizationand/or timestamping of the two sensor data are in-strumental in obtaining high-quality estimates. Ifthe two sensors are synchronized, timestamping maybe ignored provided that the communication delay isshort enough and no data loss occurs. Discrete-timeimplementation of the observers can then be madewith fixed sampling rate. If the sensors are not syn-chronized, it is necessary to timestamp the data asclose to the sensor output as possible, and deal withpossibly variable sampling rates.

Figure 2 gives a possible architecture of the inter-actions between estimator and sensors (Vision andIMU). Homography prediction obtained from IMUdata is used to initialize the vision algorithm. Once anew image has been processed, the obtained vision es-timate, considered as as a measure, is used to correctthe filter’s homography estimate. Due to the signifi-cant duration of the vision processing w.r.t. the IMUsampling rate, this usually requires to re-apply theprediction process via IMU data from the momentof the image acquisition. This leads us to maintaintwo states of the same estimator (See Figure 2): thereal-time estimator, obtained from the last homog-raphy measure and IMU data, and a post-processedestimator which is able correct a posteriori the ho-mography estimates from the time of the last visiondata acquisition to the time this data was processed.

4.4 Experimental setup

We make use of a sensor consisting of a xSens MTiGIMU working at a frequency of 200 [Hz], and an AVTStingray 125B camera that provides 40 images of800 × 600 [pixel] resolution per second. The cam-era and the IMU are synchronised. The camera useswide-angle lenses (focal 1.28 [mm]). The target isplaced over a surface parallel to the ground and isprinted out on a 376 × 282 [mm] sheet of paper toserve as a reference for the visual system. The refer-ence image is 320×240 [pixel]. So the distance d∗ canbe determined as 0.527[m]. The processed video se-quence presented in the accompanying video is 1321frames long and presents high velocity motion (rota-tions up to 5[rad/s], translations, scaling change) andocclusions. In particular, a complete occlusion of the

9

Estimator

Visual Method

Estimator

Visuo-Inertial Method

IMU

Image

StateUpdate

Post-processing Update

HomographyEstimation

Real-time

Prediction

t

t

Post-processed

Real-time

IMU

Image

Visual Method

Figure 2: Visuo-Inertial method scheme and sensormeasurements processing timeline

pattern occurs little after t = 10[s].Four images of the sequence are presented on Fig-

ure 3. A ”ground truth” of the correct homographyfor each frame of the sequence has been computedthanks to a global estimation of the homography bySIFT followed by the ESM algorithm. If the patternis lost, we reset the algorithm with the ground-truthhomography. The sequence is used at different sam-pling rates to obtain more challenging sequences andevaluate the performances of the proposed filters.

For both filters (22) and (25), the estimation gainshave been chosen as k1 = 25 and k2 = 250. Fol-lowing the notation of the description available athttp://esm.gforge.inria.fr/ESM.html, the ESM algo-rithm is used with the following parameter values:prec = 2, iter = 50.

4.5 Tracking quality

In this section we measure the quantitative perfor-mance of the different estimators. This performanceis reflected by the number of frames for which thehomography is correctly estimated. We use the cor-relation score computed by the visual method to dis-criminate between well and badly estimated frames.A first tracking quality indicator is the percentageof well estimated frames. This indicator will be la-

Figure 3: Four images of the sequence at 20[Hz]: pat-tern position at previous frame (green), vision esti-mate (blue), and prediction of the filterIMU (red).

belled as ”%track”. Another related criteria concernsthe number of time-sequences for which estimation issuccessful. For that, we define a track as a contin-uous time-sequence during which the pattern is cor-rectly tracked. We provide the number of tracks inthe sequence (label ”nb track”) and also the meanand the maximum of track length. Table 1 presentsthe obtained results for the full sequence at differentsampling rates (40[Hz], 20[Hz], 10[Hz]).

The ESMonly estimator works well at 40[Hz] since95% of the sequence is correctly tracked but perfor-mance rapidly decreases as distance between imagesgrow (72% at 20[Hz], and only 35% at 10[Hz]). Itmust be noted that the ESM estimator parametersare tuned for speed and not for performance, havingin mind real-time applications.

The filternoIMU estimator outperforms the ES-MOnly filter on the sequence at 40[Hz]. Tracks are onaverage twice longer and many losses of the patternare avoided ( 11 tracks versus 19 for ESMonly). At20[Hz] the performance is still better but the differ-ence between these two solutions reduces. At 10[Hz]the filter degrades performance.

The filterIMU tracks almost all the sequence atboth 40[Hz] and 20[Hz]. There is just one trackingfailure, which occurs around time t = 10[s] due to theocclusion of the visual target. Improvement providedby the IMU is clearly demonstrated. At 10[Hz], theperformance significantly deteriorates but this filter

10

still outperforms the other ones.Let us finally remark that these performances are

obtained despite the fact that the assumption of con-stant velocity in body frame (upon which the filterstability was established) is violated.

FrameMethod %track

nb track lengthrate track mean max

40HzESMonly 94.31 19 65.36 463

FilternoIMU 97.74 11 114.27 6071321 img FilterIMU 98.78 2 646.5 915

20HzESMonly 72.38 59 8.0 89


10HzESMonly 38.79 46 2.78 27


Table 1: Rate of good track for different frame-ratesand methods: percentage of well estimated frames,number of tracks, mean and maximum track lengthon the sequence

5 Computational aspects

Implementing vision algorithms on small UAVs is stilla challenge today. Computational optimization is of-ten necessary in order to reach real-time implemen-tation (e.g. vision processing at about 10 − 20 Hz).In this section, we discuss some possible approachesto speed up the vision processing for the homographyestimation problem here considered.

5.1 Computational optimization

Two types of optimizations can be considered. Thefirst one concerns the optimal use of the com-puting power. It consists, e.g. in computa-tion parallelization (SIMD instructions, GPU, multi-processor/core), fix-point computation, or cache op-timization. This type of optimization does not affectthe vision algorithm accuracy. Another type of opti-mization concerns the vision algorithm itself and thepossibilities to lower its computational cost. Thismay affect the accuracy of the vision algorithm out-put. These two types of optimization have been uti-lized here: SIMD (Single Instruction Multiple Data)

for computing power optimization, and pixels selec-tion for vision algorithm optimization.

SIMD instructions allow one to treat data by pack-ets. In SSE (x86 processor) and NEON (arm proces-sor), it is possible with one instruction to treat fourfloating point data. So, using this instruction withcareful data alignment can theoretically improve per-formance by a factor four. This theoretical figure islimited by load/store operation and memory (cache)transfer issues. This optimization is only done oncomputation intensive parts of the program such asintensity gradients computation, image warping, orJacobian estimation.

One approach to speed up dense vision algorithmsis to use only the pixels that provide effective infor-mation for the minimization process. Indeed, thelower the number of pixel, the lower the computa-tion cost. There are many ways to select good pix-els for the pixel intensity minimization between twoimages([8]). One approach consists in using only pix-els with strong gradient since intensity errors provideposition/orientation information contrary to imageparts with no intensity gradient. In the experimentalresults reported below, we used the best 2500 pixels.

5.2 Evaluation

We report in this section experimental result ob-tained with both the ESM and IC methods. For eachmethod, we uses the same stop criteria for the opti-mization: the maximal number of steps per scale is30 and the stop error is 1e-3. The number of scalesin the pyramid is four.

Table 2 provides the mean frame time (in ms)and mean performance (percentage of correctly esti-mated homographies) of the different couples of opti-mization and methods on the sequence at 40Hz (seeexperimental setup). The computation is done ona desktop PC (Intel(R) Core(TM) i7-2600K CPU@ 3.40GHz) and the same result is provided foran embedded platform (Odroid U2) based on anExynos4412 Prime 1.7Ghz ARM Cortex-A9 Quadprocessor.

With SIMD the performance gain is from 3.0x to1.7x on x86 and 1.7x to 1.17x on arm. With pixelselection the gain is better from 1.3 to 2.1 for ESM

11

and from 1.3x to 9x for IC.At the end, the ratio between the fastest to the

slowest is 13.6x with a lost of 22% of correctly trackedframes.

Conclusion

We have presented recent stabilization and estima-tion algorithms for the stabilization of VTOL UAVsbased on mono-camera and IMU measurements. Themain objective is to rely on a minimal sensor suitewhile requiring as least information on the environ-ment as possible. Estimation algorithms have alreadybeen evaluated experimentally. The next step is toconduct full experiments on a UAV with both stabi-lization and estimation algorithms running on-board.This work is currently in progress. Possible exten-sions of the present work are multiple, like e.g. theuse of accelerometers to improve the homography es-timation and/or the stabilization, or the extension ofthis work to possibly non-planar scenes.

Acknowledgement: A. Eudes and P. Morin havebeen supported by “Chaire dexcellence en RobotiqueRTE-UPMC”.

References

[1] A. Agarwal, C.V. Jawahar, and P.J. Narayanan.A survey of planar homography estimation tech-niques. Technical Report Technical ReportIIIT/TR/2005/12, IIIT, 2005.

[2] S. Baker and I. Matthews. Lucas-kanade 20years on: A unifying framework. Interna-tional Journal of Computer Vision, 56(3):221–255, 2004.

[3] S. Benhimane and E. Malis. Homography-based 2D visual tracking and servoing. Interna-tional Journal of Robotic Research, 26(7):661–676, 2007.

[4] J.R. Bergen, P. Anandan, K.J. Hanna, andR. Hingorani. Hierarchical model-based motion

estimation. In Computer VisionECCV’92, pages237–252. Springer, 1992.

[5] H. de Plinval, P. Morin, and P. Mouyon. Non-linear control of underactuated vehicles with un-certain position measurements and applicationto visual servoing. In American Control Confer-ence (ACC), pages 3253–3259, 2012.

[6] H. de Plinval, P. Morin, P. Mouyon, andT. Hamel. Visual servoing for underactuatedvtol uavs: a linear homography-based approach.In IEEE Conference on Robotics and Automa-tion (ICRA), pages 3004–3010, 2011.

[7] H. de Plinval, P. Morin, P. Mouyon, andT. Hamel. Visual servoing for underactuatedvtol uavs: a linear homography-based frame-work. International Journal of Robust and Non-linear Control, 2013.

[8] F. Dellaert and R. Collins. Fast image-basedtracking by selective pixel integration. In Pro-ceedings of the ICCV Workshop on Frame-RateVision, pages 1–22, 1999.

[9] A. Eudes, P. Morin, R. Mahony, and T. Hamel.Visuo-inertial fusion for homography-based fil-tering and estimation. In IEEE/RSJ Int.Conference on Intelligent Robots and Systems(IROS), pages 5186–5192, 2013.

[10] V. Grabe, H.H. Bulthoff, and P. Robuffo Gior-dano. On-board velocity estimation and closed-loop control of a quadrotor uav based on optical-flow. In IEEE Conf. on Robotics and Automa-tion (ICRA), 2012.

[11] V. Grabe, H.H. Bulthoff, and P. Robuffo Gior-dano. A comparison of scale estimation schemesfor a quadrotor uav based on optical-flow andimu measurements. In IEEE/RSJ Int. Confer-ence on Intelligent Robots and Systems (IROS),pages 5193–5200, 2013.

[12] R. Hartley and A. Zisserman. Multiple view ge-ometry in computer vision, volume 2. CambridgeUniv Press, 2000.

12

MachineESM IC

Without SIMD With SIMD Without SIMD With SIMD

PixelNo PC 60.0(94) 20.0(94) 73.0(81) 29.5(81)Yes PC 27.0(86) 15.0(86) 7.5(72) 4.4(72)

SelectionNo odroid 347(94) 202(94) 409(81) 314(81)Yes odroid 165(85) 140(86) 53(72) 45(73)

Table 2: Visual method performance: time (in ms) and accuracy (in %) for the different combination ofoptimization and platform.

Method ESM IC

Minimization objective minρ

∑q∗

[T (q)− I(wρ(q∗))]

2

Step minimization objective minδρ

∑q∗

(T (q∗)− I(w(ρ+δρ)(q

∗))2

minδρ

∑q∗

(T (wδρ(q∗))− I(wρ(q

∗)))2

Effective computation δρ =(JTJ

)−1JT (T (q∗)− I(wρ(q

∗)))

Jacobian J 12 (∆T + ∆I) ∂w

∂ρ

∣∣∣ρ

∆T ∂w∂ρ

∣∣∣0

Use current image gradient (∆I) Yes No

Use template gradient (∆T ) Yes Yes

Table 3: Visual method summary

[13] M.D. Hua, T. Hamel, P. Morin, and C. Sam-son. A control approach for thrust-propelled un-deractuated vehicles and its application to vtoldrones. IEEE Trans. on Automatic Control,54:1837–1853, 2009.

[14] F. Le Bras, T. Hamel, R. Mahony, and A. Treil.Output feedback observation and control for vi-sual servoing of vtol uavs. International Journalof Robust and Nonlinear Control, 21:1–23, 2010.

[15] Y. Ma, S. Soatto, J. Kosecka, and S.S. Sastry.An Invitation to 3-D Vision: From Images toGeometric Models. SpringerVerlag, 2003.

[16] R. Mahony, T. Hamel, P. Morin, and E. Malis.Nonlinear complementary filters on the speciallinear group. International Journal of Control,85:1557–1573, 2012.

[17] P. Martin and E. Salaun. The true role of ac-celerometer feedback in quadrotor control. InIEEE Conf. on Robotics and Automation, pages1623–1629, 2010.

[18] N. Metni, T. Hamel, and F. Derkx. A uav forbridges inspection: Visual servoing control lawwith orientation limits. In 5th Symposium onIntelligent Autonomous Vehicles (IAV 04), 2004.

[19] J.-M. Pflimlin, P. Soueres, and T. Hamel. Po-sition control of a ducted fan vtol uav in cross-wind. 80:666–683, 2007.

[20] O. Shakernia, Y. Ma, T. Koo, and S. Sastry.Landing an unmanned air vehicle: Vision basedmotion estimation and nonlinear control. AsianJournal of Control, 1(3):128–145, 1999.

13

[21] D. Suter, T. Hamel, and R. Mahony. Visualservo control using homography estimation forthe stabilization of an x4-flyer. 2002.

14

Documents

Control and estimation algorithms for the …Control and estimation algorithms for the stabilization of VTOL UAVs from mono-camera measurements H. de Plinval1, A. Eudes 2, P. Morin