Instantaneous robot self-localization and motion estimation with omnidirectional vision

Robotics and Autonomous Systems 55 (2007) 667–674www.elsevier.com/locate/robot

Instantaneous robot self-localization and motion estimation withomnidirectional vision

Libor Spacek∗, Christopher Burbridge

Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK

Available online 24 May 2007

Abstract

This paper presents two related methods for autonomous visual guidance of robots: localization by trilateration, and interframe motionestimation. Both methods use coaxial omnidirectional stereopsis (omnistereo), which returns the range r to objects or guiding points detectedin the images. The trilateration method achieves self-localization using r from the three nearest objects at known positions.

The interframe motion estimation is more general, being able to use any features in an unknown environment. The guiding points are detectedautomatically on the basis of their perceptual significance and thus they need not have either special markings or be placed at known locations.

The interframe motion estimation does not require previous motion history, making it well suited for detecting acceleration (in 20th of asecond) and thus supporting dynamic models of robot’s motion which will gain in importance when autonomous robots achieve useful speeds.

An initial estimate of the robot’s rotation ω (the visual compass) is obtained from the angular optic flow in an omnidirectional image. A newnoniterative optic flow method has been developed for this purpose. Adding ω to all observed (robot relative) bearings θ gives true bearingstowards objects (relative to a fixed coordinate frame).

The rotation ω and the r, θ coordinates obtained at two frames for a single fixed point at unknown location are sufficient to estimate thetranslation of the robot. However, a large number of guiding points are typically detected and matched in most real images. Each such pointprovides a solution for the robot’s translation. The solutions are combined by a robust clustering algorithm Clumat that reduces rotation andtranslation errors.

Simulator experiments are included for all the presented methods. Real images obtained from ScitosG5 autonomously moving robot were usedto test the interframe rotation and to show that the presented vision methods are applicable to real images in real robotics scenarios.c© 2007 Elsevier B.V. All rights reserved.

Keywords: Self-localization; Motion estimation; Omnidirectional vision; Omnistereo; Omniflow

1. Introduction

Amongst all robot sensors available, vision provides themost information, at the greatest range and often with thegreatest accuracy.

Navigation and environment sensing are essential forautonomous mobile robots. The ability to quickly estimateposition in an environment is often crucial. Omnidirectionalvision offers detailed information about the entire surroundingsand as such is ideally suited for use in robot localization.

Omnidirectional vision sensors have been constructed inmany different ways. Tan et al. [13] use a pyramid of mirrors,

∗ Corresponding author.E-mail address: [email protected] (L. Spacek).

0921-8890/$ - see front matter c© 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.robot.2007.05.009

and point multiple cameras at the pyramid. This configurationoffers high resolution and the possibility of a single viewpoint but is not isotropic and the registration and the physicalarrangement of the cameras can be difficult. Rotating camerasand mirrors were used by Kang and Szeliski [6] and Ishiguroet al. [5]. However, difficulties were encountered with theregistration and the motion delay of the camera. Wide angleand fish eye lenses have been used by Swaminathan and Nayar[12] and Shah and Aggarwal [9]. Satisfactory methods ofautocalibration and rectification of the distortions of the truefish eye lenses have only recently been developed by Micusikand Pajdla [7].

Catadioptric omnidirectional sensors use a mirror anda camera. The mirror is rotationally symmetrical and thecamera points at the mirror along its rotational axis. One

http://www.elsevier.com/locate/robot

mailto:[email protected]

http://dx.doi.org/10.1016/j.robot.2007.05.009

668 L. Spacek, C. Burbridge / Robotics and Autonomous Systems 55 (2007) 667–674

major advantage that this design has over some of the otheromnidirectional sensors is that there is no need to wait for amoving camera or to synchronize multiple cameras. This meansbetter suitability to dynamic environments and mobile robotics.

There are several types of mirror that can be used. Weuse the conical mirror. This has the benefit of producing noradial distortion or loss of radial resolution, as is producedby hyperbolic mirrors. However, conical mirrors producemultiple effective viewpoints, which was seen until recentlyas problematic. Spacek [11] shows that a single effectiveviewpoint is not actually required for a correct perspectiveprojection and image unwarping.

We use a simulator capable of accurately modellingomnidirectional vision using ray-tracing techniques [2]. Thisallows us to accurately compare our results to the ground truth.We created a simulated robot containing two vertically alignedcatadioptric omnidirectional sensors. The ray tracing methodsreconstruct the images, as seen through the mirrors, includingtheir inherent pixellation and resolution errors.

We first deploy omnistereo for range finding. Then weuse the distance to the objects in a simple landmark-basedlocalization method, using trialeration from the three closestobjects at known positions.

We then develop an optic flow method to estimate therotation of the robot and, using it, we show how to calculatethe motion of the robot between two frames from only a singlearbitrary fixed point at an unknown location.

Finally, we demonstrate the extensibility of the developedmethods to real robotics with experiments that use a real robotset up in the same way as the original simulated robot.

2. Omnistereo range finding

The range to objects r can be calculated using a pair ofomnidirectional images in much the same way as in classicalforward-looking stereo vision. The formula for calculating therange [11] is:

r =vs

hi1 − hi2− d, (1)

where d is the distance from the camera lens to the tip of themirror, s is the distance between the mirrors, v is the distanceof the image behind the lens of the camera and hi1 − hi2 isthe radial disparity between the imaged object positions in thetwo images. See Fig. 1. The distance to an object can only becomputed if the object lies in the common region. The commonregion for our mirrors has the vertical field of view of 45◦

and the horizontal field of view of 360◦, which compares veryfavourably with classical stereopsis.

To calibrate v and convert between pixels and mm, v iscalculated as in [11]:

v =

(d

R+ 1

)rm, (2)

where R is the radius of the mirror, and rm is the radius of themirror in the image.

Fig. 1. Diagram showing the vertically aligned coaxial omnistereo configura-tion.

Fig. 2. The virtual lab. Six coloured pillars are positioned around the arena atknown locations for landmark-based robot localization.

These formulae apply for conical mirrors with 90◦ angleat their apex. Other mirrors do not produce a cylindricalprojection. However, similar formulae can be obtained forall circularly symmetrical mirrors. The registration issuesconnected with aligning the line of sight to the mirror axis arediscussed in [11] and in [3].

2.1. Experiments

In our initial experiments we used the simulated arena of6 m by 4 m as shown in Fig. 2. In all experiments reportedin this paper the robots were configured for vertically alignedomnidirectional stereo as described here and shown in Fig. 1.

The robot was programmed with a random wandering,obstacle avoiding behaviour. This was run for 450 frames andall images were saved alongside the actual robot trajectory.These frames were then processed to calculate the distance tothe pillars and the estimated position of the robot in the arena.

In order to achieve obstacle avoidance a map of thearena was programmed into the robot motion controller. Thecontroller drives the robot forwards, checking the map until itis at a pillar or a wall. It then randomly turns either left or right

L. Spacek, C. Burbridge / Robotics and Autonomous Systems 55 (2007) 667–674 669

Fig. 3. The trajectory of the robot overlayed onto an overhead image of thearena. Measurements are in simulator units, where one unit is 50 mm.

Fig. 4. The image of the upper mirror superimposed on the lower. The smallcircular markers show the points matched by the omnistereo algorithm.

for a random duration and continues. The trajectory of the robotis shown in Fig. 3.

The radial disparity between upper and lower mirror canbe seen in Fig. 4. The algorithm for measuring the disparityin these early experiments relies on colour matching alongradial lines. Of course, for real scenes, more sophisticatedcorrespondence matching is usually required.

The range to each of the six coloured pillars in each videoframe was calculated by measuring the disparity of the top ofthe pillar between the corresponding upper and lower images,and applying Eqs. (1) and (2) above. The following table showsa summary of the range results:

Pillar Max. error (mm) Mean error (mm)

Red 136.6 21.95Green 124.8 25.42Blue 256.9 51.97Cyan 250.1 62.93Magenta 110.5 23.17Yellow 75.28 20.14Mean 159.03 34.26

s 76.05 18.37

Fig. 5. Graph of actual distance to yellow pillar (solid line), and computeddistance (dotted line).

Fig. 6. Circles are drawn around three objects. The robot lies within the triangleof uncertainty.

Graph of distance against frame number for the yellow pillaris presented in Fig. 5. When it is not possible to calculate thedistance to an object, for example, if it is too close or occluded,this is recorded on the graph as zero value. It is apparentfrom the graph that as the distance of the object increases, theabsolute accuracy of the computed distance decreases. This isalso shown by the mean error for the cyan and yellow pillarsbeing far greater than that of the others, these two pillars beingthe furthest from the robot. Also, as the top of the pillar getscloser to the centre of the mirror, the horizontal (angular) imageresolution decreases.

The conclusion of this experiment is that the mean errorsin the measured range to all objects were 0.86 ± 0.46% of thesmaller arena dimension.

3. Self-localisation using trilateration

The calculated range of three nearest objects at knownlocations is used to fix the robot’s position at the point ofintersection of three circles around the objects, see Fig. 6.Three well separated objects are required for a unique solution.Since the ranges of objects contain some errors, a triangle ofuncertainty will form. The centre of this triangle is recorded asthe robot’s estimated position.

The graph in Fig. 7 shows the error in the robot’s localizationfor the first 180 frames. The actual trajectory and the calculatedtrajectory of the robot are shown in Fig. 8.


Fig. 7. Graph of the error in the predicted position of the robot.

Fig. 8. Graph of the actual robot trajectory and the computed robot trajectory.

The accuracy of the trilateration depends entirely on theaccuracy of the measured range. Therefore the errors are againof the order of 1% of the objects’ range. By autonomouslychoosing the nearest objects, and therefore the most accuratemeasurements, the robot was able to calculate its locationwith the maximum error of 130 mm, and the mean error of29.97 mm, or 0.75% of the smaller of the arena dimensions.

4. Optic flow

The optic flow is calculated directly (non iteratively) in theregions shown in Fig. 9 by calculating the flows in the directionsof the edge gradients of two adjacent edge pixels in the imageand combining them to calculate the true flow.

Let the image function be f (x, y). Carry out edge findingwith a first derivative edge finder capable of returningnormalized image function gradients g = ∇ f (x, y) at all pointsof significant curvature and significant edge contrast [10].

The Horn and Schunk [4] optic flow, Eq. (3), is derived fromthe assumption that the grey level values of imaged points donot change as the points move across an image:

f (x, y, t) = f (x + dx, y + dy, t + dt).

Then using only the first order terms of the Taylor series gives:

−∂ f

∂t' v · g (3)

Fig. 9. The image sectors used to calculate the optic flow shown on twosuperimposed successive frames.

where v = (vx , vy) = ( dxdt ,

dydt ) is the optic flow vector to be

found and · denotes the scalar product of two vectors. Dividingboth sides of Eq. (3) by |g| (the magnitude of g), we obtain themagnitude vg of the flow in the direction of the unit vector g:

vg = v · g ' −

∂ f∂t

|g|. (4)

Let two edge elements (edgels) with different orientations haveunit gradient vectors u = (ux , u y), w = (wx , wy) respectively.Such edgels are typically found on opposite sides of a point cof high curvature. Substituting these known unit gradients for gin Eq. (4), we get:

vu = v · u, vw = v · w, (5)

where vu and vw are known constants, obtained by evaluatingthe right-hand side of Eq. (4). Applying u and w at c, we canthus estimate the optic flow vector v at c by solving Eq. (5)directly for vx and vy , provided that

d = uxwy − u ywx 6= 0

vx =vuwy − vwu y

d, vy =

vwux − vuwx

d. (6)

5. Interframe robot rotation

The idea of using omnidirectional images to detect therotation (visual compass) and possibly also the translation ofa robot has been investigated by Binding and Labrosse [1] inthe context of appearance based vision, whereby images aretypically added up in the radial direction to obtain a simpleoverall measure of luminance. We take a different approachwhich concentrates on the positions and image velocities offine image features. Our approach is in principle capable ofgreater accuracy and thus can be applied to interframe rotations.Omnidirectional vision again supports this method well, as theguiding points remain in view.

In order to create a more generally applicable localizationthan the trilateration, the rotation of the robot needs to be


Fig. 10. The errors in estimated interframe rotation of the simulated robot.

separated from its translation. This can be achieved by using theoptic flow, where the vθ component of the flow vectors gives theapproximate rotation.

The robot’s rotation ω is estimated by averaging the angularflow components over the front and back regions of theomnidirectional image:

ω =

(vθ

r

),

where the vθ component of the optic flow vectors is foundby converting v to polar coordinates centred on the mirror.The reason for leaving out the side regions is that they aremore affected by the translatory flows. The translatory flowson the sides appear as strong local vθ flows with opposite signs.However, they cannot be simply cancelled out because they alsodepend on the distance of the various viewed objects.

5.1. Experiments

250 frames were generated at 20 frames per second, withthe robot having random velocities for the left and right wheels.The rotational velocity was restricted to 20◦ per second, andthe translational velocity was restricted to 50 cm/s. A graphof the error in the estimation of the rotation of the robot isshown in Fig. 10, and a graph showing the cumulative errorof the robot heading based only on the visual compass is shownin Fig. 11. It is apparent that the optic flow method tends tosystematically underestimate the robot rotation. Nevertheless,the Pearson product–moment correlation coefficient betweenthe actual change in angle of the robot and the estimated changein angle over 250 frames was found to be 0.94, showing a verystrong correlation.

6. Interframe robot translation

Having estimated the rotation of the robot using optic flow,the translation can now be calculated as follows:

– At time step t , choose a guiding point in the lowercatadioptric image.

– Calculate the range r0 and the bearing angle θ0 to the pointby finding its location in the upper mirror and using thecoaxial omnistereo as described earlier.

– At time step t + 1 calculate the new range to the point, r1and the new bearing angle θ1. See Fig. 12.

Fig. 11. The cumulative error in the simulated robot heading.

Fig. 12. The translation and rotation of the robot between two frames, t = 0and t = 1.

– Calculate the rotation of the robot ω between frames t andt + 1.

– Compute the translation (tx , ty) of the robot from Eqs. (7)and (8):

tx = x0 − cos(ω) x1 − sin(ω) y1 (7)

ty = y0 + sin(ω) x1 − cos(ω) y1. (8)

These equations assume rotation about the z axis (vertical axis)only. The full 3D motion, using homogeneous coordinates, isgiven by:

p1 =

r11 r12 r13 txr21 r22 r23 tyr31 r32 r33 tz0 0 0 1

p0, (9)

where the column vectors p0 and p1 are the positions of thesame guidance point in the two cartesian frames and r11 to r33are the elements of a 3D rotation matrix. Solving this moregeneral case necessitates estimating the rotations about each ofthe three axis (Euler’s angles).


7. Clustering with Clumat

Although only one point is required in order to estimatethe translation of the robot, by using multiple points, multiplesolution of the translation can be found. These solutions arecombined in order to acquire a more accurate estimate of boththe translation and the rotation of the robot.

We developed a clustering algorithm Clumat, underpinnedby the mathematical model that gives it fast convergence.The novel aspect of Clumat is that unlike ordinary k-meansclustering or random sampling consensus (Ransac) algorithms,Clumat does not rely only on the statistical properties of thecandidate solutions. The outliers are removed automaticallyas in ordinary k-means clustering but this is followed ateach iteration by using the mathematical model, in this caseEqs. (7) and (8), to repopulate the solutions space. In particular,having rejected some translatory outliers, we recompute themean rotation of the robot by solving Eqs. (7) and (8) for ω, andthen recompute the individual translations for the remainingpoints. This is iterated until no more outliers beyond certaindistance from the mean remain. The final mean rotation andtranslation are returned as the results.

In order to test Clumat, only the simple coloured pillarsare again used as the guiding points to calculate the range.Crucially, we no longer need to know their locations in orderto track the robot. In real environment applications any objectpoints can be used as long as they can be reliably located in twoframes.

In this simulation experiment, there are a maximum ofsix estimates for (tx , ty). In real environments there will beseveral hundred, offering much better clustering opportunitiesfor Clumat.

The motion estimation algorithm was tested on each pair offrames generated for the previous rotation tests. The error wasdefined as the magnitude (the Euclidian norm) of the differencebetween the robot’s true translation vector and the estimatedtranslation vector. Fig. 13 shows the interframe translationssuggested by the pillars for one pair of frames and the meantranslation after outliers were eliminated. The final translationerror (9 mm) represents a significant improvement over theerrors before the clustering, which are typically twice as large.Given more points, the accuracy should improve further.

8. Experiments with a real robot

To extend this work from simulation to real-world robotics,we mounted two 90◦ mirrors on a Scitos G5 mobile robotin the same arrangement as in our earlier simulation. This isshown in Fig. 14, and an example of the omnidirectional imagesproduced is shown in Fig. 15. This robot was then used toperform rotation estimation experiments in the same way asin the previous simulated rotation estimation experiments, thistime using the real Robot Arena at the University of Essex.The results for estimated robot heading compared to actualheading are similar to the results obtained in simulation. Errorsin the rotation estimates are shown in Fig. 16. A graph of actualheading and calculated heading for the real robot experiment isshown in Fig. 17.

Fig. 13. The translations suggested by each pillar and the mean translation aftertwo iterations of clustering.

Fig. 14. DaX, the Scitos G5 mobile robot with two coaxial mounted conicalmirrors and two firewire cameras.

9. Conclusion

The use of omnidirectional stereo vision for range findingis clearly a definite possibility. The results show that the errorin distance averages at about 2 cm for objects about 2 maway, i.e. about 1% of the distance. Different mirrors willcover different ranges. A 120◦ conical mirror, for example,will show the ground around the robot rather than objects inthe distance. The accuracy achieved is generally better thanthat of sonar sensors. In many situations vision ranging is alsomore favourable than laser range finders as more information isprovided — here the object is identified as well as its distancemeasured, avoiding problems such as perceptual aliasing.

Omnidirectional vision is ideal for robot localization asobjects do not go out of sight when turning. Using an accurateomnistereo made fast localization by simple trilateration


Fig. 15. An example of an omnidirectional image using a conical mirror on ourmobile robot.

Fig. 16. The errors in estimated interframe rotation of the real robot.

Fig. 17. The cumulative error in the robot heading using the real robot. Theground truth obtained by external VICON tracking system (solid line) versusthe computed heading (dotted line).

possible. The speed of this method is limited only by theprocessing time — it could run at subframe rates. It wouldnot be realistic to measure robot’s motion occuring within50 ms by GPS, for instance. Fig. 7 shows that the errorin position remained roughly constant. This result compares

favourably with localization by dead reckoning where the erroraccumulates with time.

Localization experiments were performed using a simulatorin order to make careful comparisons against the ground truth(the exact true translation and rotation of the robot). The use ofthe simulator instead of the real scenes made certain aspects ofthe localization experiments less demanding, in particular, theautomatic selection and matching of the guidance points. On theother hand, there were fewer guidance points available, whichlimited the effectiveness of the Clumat clustering and hence theaccuracy of the interframe translation results.

The trilateration localization relied on three objects atknown locations. Object positions can be fixed automaticallyby employing a simultaneous localization and mapping scheme(Slam), such as the one proposed in [8], or others. Nevertheless,this restricting assumption was removed by the subsequentinterframe rotation and translation methods, which yieldedsimilar results.

All of these methods are equally applicable to longer timeintervals and motions. We have chosen the interframe case asthe most challenging.

In order to address the issue of real robotics, an inter-framerotation experiment with a real robot and real images wasperformed, resulting in a close agreement with the equivalentrotation experiments on the simulator. This indicates that theremaining interframe translation methods may also be extendedto real robots, which is to be our future work.

References

[1] D. Binding, F. Labrosse, Visual local navigation using warped panoramicimages, in: 2006 Towards Autonomous Robotic Systems, TAROS’06,2006, pp. 19–26.

[2] C. Burbridge, L. Spacek, Omnidirectional vision simulation and robotlocalisation, in: 2006 Towards Autonomous Robotic Systems, TAROS’06,2006, pp. 37–44.

[3] G. Geyer, K. Daniilidis, Paracatadioptric camera calibration, IEEETransactions on Pattern Analysis and Machine Intelligence 24 (4) (2002)1–10.

[4] B.K.P. Horn, B.G. Schunk, Determing optical flow, Artificial Intelligence17 (1981) 185–203.

[5] H. Ishiguro, M. Yamamoto, S. Tsuji, Omni-directional stereo, IEEETransactions on Pattern Analysis and Machine Intelligence 14 (2) (1992)257–262.

[6] S.B. Kang, R. Szeliski, 3-d scene data recovery using omnidirectionalmultibaseline stereo, International Journal of Computer Vision 25 (2)(1997) 167–183.

[7] B. Micusik, T. Pajdla, Structure from motion with wide circular fieldof view cameras, IEEE Transactions on Pattern Analysis and MachineIntelligence 28 (7) (2006) 1135–1149.

[8] K. Rushant, L. Spacek, An autonomous vehicle navigation systemusing panoramic vision techniques, in: Proceedings of the InternationalSymposium on Intelligent Robotics Systems, ISIRS, Tata McGraw-Hill,1998, pp. 275–282.

[9] S. Shah, J.K. Aggarwal, Mobile robot navigation and scene modelingusing stereo fish-eye lens system, Machine Vision and Applications 10(4) (1997) 159–173.

[10] L. Spacek, Edge detection and motion detection, Image and VisionComputing 4 (1) (1986) 43–56.

[11] L. Spacek, A catadioptric sensor with multiple viewpoints, Robotics andAutonomous Systems 51 (1) (2005) 3–15.


[12] R. Swaminathan, S.K. Nayar, Nonmetric calibration of wide-angle lensesand polycameras, IEEE Transactions on Pattern Analysis and MachineIntelligence 22 (10) (2000) 1172–1178.

[13] K.-H. Tan, H. Hua, N. Ahuja, Multiview panoramic cameras usingmirror pyramids, IEEE Transactions on Pattern Analysis and MachineIntelligence 26 (7) (2004) 941–946.

Dr. Libor Spacek is a tenured academic in ComputerScience at the University of Essex, UK, specialising inComputer Vision.

He graduated with B.Sc. in Mathematics andobtained his M.Sc., with disctinction, in ArtificialIntelligence. He was awarded his Ph.D. in 1986 for thethesis entitled: “The Detection of Contours and theirVisual Motion”, under the supervision of Prof. J.M.Brady.

He has published widely on generally applicable Vision methods: motioncomputation, occlusion analysis, edge detection and texture analysis. He hassupervised a number of Ph.D. students in the area of Face Recognitionand has published research papers on Face Recognition and MachineLearning.

Dr. Spacek’s latest reasearch interests are centred on Geometric Algebra,Omnidirectional Vision, and Visual Guidance for Robotics.

Christopher Burbridge is currently a first year Ph.D.student at the University of Essex, supervised by Dr.Libor Spacek.

After studying Computer Science at undergraduatelevel, he did an M.Sc. in Embedded Systems andRobotics at the University of Essex. His Ph.D. workis in the field of omnidirectional vision and robotnavigation.

Documents

Instantaneous robot self-localization and motion estimation with omnidirectional vision