16
Hindawi Publishing Corporation Journal of Electrical and Computer Engineering Volume 2013, Article ID 374165, 15 pages http://dx.doi.org/10.1155/2013/374165 Research Article Monocular Vision SLAM for Indoor Aerial Vehicles Koray Çelik and Arun K. Somani Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50010, USA Correspondence should be addressed to Koray C ¸ elik; [email protected] Received 14 October 2012; Accepted 23 January 2013 Academic Editor: Jorge Dias Copyright © 2013 K. C ¸elik and A. K. Somani. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper presents a novel indoor navigation and ranging strategy via monocular camera. By exploiting the architectural orthog- onality of the indoor environments, we introduce a new method to estimate range and vehicle states from a monocular camera for vision-based SLAM. e navigation strategy assumes an indoor or indoor-like manmade environment whose layout is previously unknown, GPS-denied, representable via energy based feature points, and straight architectural lines. We experimentally validate the proposed algorithms on a fully self-contained microaerial vehicle (MAV) with sophisticated on-board image processing and SLAM capabilities. Building and enabling such a small aerial vehicle to fly in tight corridors is a significant technological challenge, especially in the absence of GPS signals and with limited sensing options. Experimental results show that the system is only limited by the capabilities of the camera and environmental entropy. 1. Introduction e critical advantage of vision over active proximity sensors, such as laser range finders, is the information to weight ratio. Nevertheless, as the surroundings are captured indirectly through photometric effects, extracting absolute depth infor- mation from a single monocular image alone is an ill-posed problem. In this paper, we have aimed to address this problem with as minimal use of additional information as possible for the specific case of a rotorcraſt MAV where size, weight, and power (SWaP) constraints are severe and investigate the feasibility of low-weight and low-power monocular vision- based navigation solution. Although we emphasize MAV use in this paper, our approach has been tested and proved per- fectly compatible with ground based mobile robots, as well as wearable cameras such as helmet or tactical vest mounted device, and further, it can be used to augment the reliability of several other types of sensors. Considering the foreseeable future of intelligence, surveillance and reconnaissance mis- sions will involve GPS-denied environments; portable vision- SLAM capabilities can pave the way for a GPS-free navigation systems. Our approach is inspired by how intelligent animals such as cats and bats interpret depth via monocular visual cues such as relative height, texture gradient, and motion parallax [1] by subconsciously tracking dense elements such as foliage. We integrate this ranging technique with SLAM to achieve autonomous indoor navigation of an MAV. 1.1. Related Work on Vision-Based SLAM. Addressing the depth problem, the literature resorted to various methods such as the Scheimpflug principle, structure from motion, optical flow, and stereo vision. e use of moving lenses for monocular depth extraction [2] is not practical for SLAM, since this method cannot focus at multiple depths at once. e dependence of stereo vision on ocular separation [3] limits its useful range. And image patches obtained via optical flow sensors [4, 5] are too ambiguous for the landmark association procedure for SLAM. In sensing, efforts to retrieve depth information from a still image by using machine learning such as the Markov Random Field learning algorithm [6, 7] are shown to be effective. However, a-priori information about the environment must be obtained from a training set of images, which disqualifies them for an online-SLAM algo- rithm in an unknown environment. Structure from Motion (SFM) [3, 8, 9] may be suitable for the offline-SLAM problem. However, an automatic analysis of the recorded footage from a completed mission cannot scale to a consistent localization over arbitrarily long sequences in real time. Methods such

Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

Hindawi Publishing CorporationJournal of Electrical and Computer EngineeringVolume 2013 Article ID 374165 15 pageshttpdxdoiorg1011552013374165

Research ArticleMonocular Vision SLAM for Indoor Aerial Vehicles

Koray Ccedilelik and Arun K Somani

Department of Electrical and Computer Engineering Iowa State University Ames IA 50010 USA

Correspondence should be addressed to Koray Celik korayiastateedu

Received 14 October 2012 Accepted 23 January 2013

Academic Editor Jorge Dias

Copyright copy 2013 K Celik and A K Somani This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

This paper presents a novel indoor navigation and ranging strategy via monocular camera By exploiting the architectural orthog-onality of the indoor environments we introduce a new method to estimate range and vehicle states from a monocular camera forvision-based SLAM The navigation strategy assumes an indoor or indoor-like manmade environment whose layout is previouslyunknown GPS-denied representable via energy based feature points and straight architectural lines We experimentally validatethe proposed algorithms on a fully self-contained microaerial vehicle (MAV) with sophisticated on-board image processing andSLAM capabilities Building and enabling such a small aerial vehicle to fly in tight corridors is a significant technological challengeespecially in the absence of GPS signals and with limited sensing options Experimental results show that the system is only limitedby the capabilities of the camera and environmental entropy

1 Introduction

The critical advantage of vision over active proximity sensorssuch as laser range finders is the information to weight ratioNevertheless as the surroundings are captured indirectlythrough photometric effects extracting absolute depth infor-mation from a single monocular image alone is an ill-posedproblem In this paper we have aimed to address this problemwith as minimal use of additional information as possiblefor the specific case of a rotorcraft MAV where size weightand power (SWaP) constraints are severe and investigate thefeasibility of low-weight and low-power monocular vision-based navigation solution Although we emphasize MAV usein this paper our approach has been tested and proved per-fectly compatible with ground based mobile robots as wellas wearable cameras such as helmet or tactical vest mounteddevice and further it can be used to augment the reliabilityof several other types of sensors Considering the foreseeablefuture of intelligence surveillance and reconnaissance mis-sions will involveGPS-denied environments portable vision-SLAMcapabilities can pave the way for a GPS-free navigationsystems

Our approach is inspired by how intelligent animals suchas cats and bats interpret depth via monocular visual cuessuch as relative height texture gradient and motion parallax

[1] by subconsciously tracking dense elements such as foliageWe integrate this ranging technique with SLAM to achieveautonomous indoor navigation of an MAV

11 Related Work on Vision-Based SLAM Addressing thedepth problem the literature resorted to various methodssuch as the Scheimpflug principle structure from motionoptical flow and stereo vision The use of moving lenses formonocular depth extraction [2] is not practical for SLAMsince thismethod cannot focus atmultiple depths at onceThedependence of stereo vision on ocular separation [3] limitsits useful range And image patches obtained via optical flowsensors [4 5] are too ambiguous for the landmark associationprocedure for SLAM In sensing efforts to retrieve depthinformation from a still image by using machine learningsuch as the Markov Random Field learning algorithm [6 7]are shown to be effective However a-priori informationabout the environment must be obtained from a training setof images which disqualifies them for an online-SLAM algo-rithm in an unknown environment Structure from Motion(SFM) [3 8 9]may be suitable for the offline-SLAMproblemHowever an automatic analysis of the recorded footage froma completed mission cannot scale to a consistent localizationover arbitrarily long sequences in real time Methods such

2 Journal of Electrical and Computer Engineering

Right hallway line

Range B

Range A

120573119867

120595

119897

119903119906119903

119906119897119910 119911

119909∙

∙120579

Figure 1 A three-dimensional representation of the corridor with respect to the MAV Note that the width of the hallway is not provided tothe algorithm and the MAV does not have any sensors that can detect walls

as monoSLAM [10 11] which depend on movement fordepth estimation and offer a relative recovered scale may notprovide reliable object avoidance for an agile MAV in anindoor environment A rotorcraft MAV needs to bank tomove the camera sideways a movement severely limited in ahallway for helicopter dynamics it has to be able to performdepth measurement from a still or nearly-still platform

In SLAM Extended Kalman Filter based approaches withfull covariance have a limitation for the size of a manageablemap in real time considering the quadratic nature of thealgorithm versus computational resources of anMAV Globallocalization techniques such as Condensation SLAM [12]require a full map to be provided to the robot a-prioriAzimuth learning based techniques such as Cognitive SLAM[13] are parametric and locations are centered on the robotwhich naturally becomes incompatible with ambiguous land-marksmdashsuch as the landmarks our MAV has to work withImage registration basedmethods such as [14] propose a dif-ferent formulation of the vision-based SLAM problem basedon motion structure and illumination parameters withoutfirst having to find feature correspondences For a real-timeimplementation however a local optimization procedure isrequired and there is a possibility of getting trapped ina local minimum Further without merging regions witha similar structure the method becomes computationallyintensive for an MAV Structure extraction methods [15]have some limitations since an incorrect incorporation ofpoints into higher level features will have an adverse effecton consistency Further these systems depend on a successfulselection of thresholds

12 Comparison with Prior Work and Organization Thispaper addresses the above shortcomings using an unmodifiedconsumer-grade monocular web camera By exploiting thearchitectural orthogonality of the indoor and urban outdoorenvironments we introduce a novel method for monocularvision-based SLAMby computing absolute range and bearinginformationwithout using active ranging sensorsMore thor-ough algorithm formulations and newer experimental resultswith a unique indoor-flying helicopter are discussed in thispaper than in our prior conference articles [16ndash19] Section 2explains the procedures for perception of world geometryas pre-requisites for SLAM such as range measurement

methods as well as performance evaluations of proposedmethodsWhile a visual turn-sensing algorithm is introducedin Section 3 SLAM formulations are provided in Section 4Results of experimental validation as well as a descriptionof the MAV hardware platform are presented in Section 5Figure 2 can be used as a guide to sections as well as to theprocess flow of our proposed method

2 Problem and Algorithm Formulation

We propose a novel method to estimate the absolute depth offeatures using a monocular camera as a sole means of navi-gation The camera is mounted on the platform with a slightdownward tilt Landmarks are assumed to be stationaryMov-ing targets are also detected however they are not consideredas landmarks and therefore ignored by the map Altitude ismeasured in real time via the on-board ultrasonic altimeteron our MAV or in the case of a ground robot it can beprovided to the system via various methods depending onwhere the camera is installed It is acceptable that the cameratranslates or tilts with respect to the robot such as mountedon a robotic arm as long as the mount is properly encoded toindicate altitude We validate our results with a time-varyingaltitude The ground is assumed to be relatively flat (no morethan 5 degrees of inclination within a 10-meter perimeter)Our algorithmhas capability to adapt to inclines if the cameratilt can be controlled we have equipped some of our testplatforms with this capability

21 Landmark Extraction Step I Feature Extraction A land-mark in the SLAM context is a conspicuous distinguishinglandscape feature marking a location A minimal landmarkcan consist of two measurements with respect to robot posi-tion range and bearing Our landmark extraction strategy isa three step automatic process All three-steps are performedon a frame 119868

119905 before moving onto the next frame 119868

119905+ 1

The first step involves finding prominent parts of 119868119905that

tend to be more attractive than other parts in terms oftexture dissimilarity and convergence These parts tend tobe immune to rotation scale illumination and image noiseandwe refer to them as features which have the form119891

119899(119906 V)

We utilize two algorithms for this procedure For flying

Journal of Electrical and Computer Engineering 3

Navigation computer

Communications

Mission controlMission planning

Battery Manual override Yaw gyroscope Flight surfaces Power plant

RAM

Inertial measurement unit

USB20

Orthogonalitypresent

SLAMHelix bearing algorithm

VGA video

Compass

Mass storage

Fuselage

Autopilot

GPS

Altimeter Airspeed

Manual overrideWireless RS232Wireless RS232

Custom Linux kernel

IEEE 80211

acquisition and edge filteringLandmark extraction

Range-bearing measurements

New heading

NoYes

Landmarks

Range bearingH

eading

Line-slope extraction

Hallw

ay lines

2 MP monocular image

Figure 2 Block diagram illustrating the operational steps of the monocular vision navigation and ranging at high level and its relations withthe flight systems The scheme is directly applicable to other mobile platforms

platforms considering the limited computational resourcesavailable we prefer the the algorithm proposed by Shi andTomasi [20] in which sections of 119868 with large eigenvaluesare extracted into a set Ψ such that Ψ = 119891

1 1198912 119891

119899

Although there is virtually no limit for 119899 it is impossible atthis time in the procedure to make an educated distinctionbetween a useless feature for the map (ie one that cannotbe used for ranging and bearing) and a potential landmark(ie one that provides reliable range and bearing informationand thus can be included in the map) For ground basedplatforms we prefer the SURF algorithm (Figure 3) due tothe directionality its detected features offer [21] Directionalfeatures are particularly useful where the platform dynamicsare diverse such as human body or MAV applications ingusty environments directional features are more robust interms of associating them with architectural lines whereinstead of a single distance threshold the direction of featureitself also becomes a metric It is also useful when ceilings areused where lines are usually segmented and more difficult todetect This being an expensive algorithm we consider fasterimplementations such as ASURF

In following steps we describe how to extract a sparse setof reliable landmarks from a populated set of questionablefeatures

22 Landmark Extraction Step II Line and Slope ExtractionConceptually landmarks exist in the 3D inertial frame andthey are distinctive whereas features in Ψ = 119891

1 1198912 119891

119899

exist on a 2D image plane and they contain ambiguity Inother words our knowledge of their range and bearing infor-mation with respect to the camera is uniformly distributedacross 119868

119905 Considering the limited mobility of our platform

in the particular environment parallax among the features isvery limited Thus we attempt to correlate the contents of Ψwith the real world via their relationship with the perspectivelines

On a well-lit well-contrasting noncluttered hallway per-spective lines are obvious Practical hallways have randomobjects that segment or even falsely mimic these lines More-over on a monocular camera objects are aliased with dis-tance making it more difficult to find consistent ends of per-spective lines as they tend to be considerably far from thecamera For these reasons the construction of those linesshould be an adaptive approach

We begin the adaptive procedure by edge filtering theimage 119868 through a discrete differentiation operator withmore weight on the horizontal convolution such as

1198681015840

119909= 119865ℎlowast 119868 119868

1015840

119910= 119865V lowast 119868 (1)

4 Journal of Electrical and Computer Engineering

where lowast denotes the convolution operator and 119865 is a 3 times 3kernel for horizontal and vertical derivative approximations1198681015840

119909and 1198681015840119910are combined with weights whose ratio determines

the range of angles through which edges will be filtered Thisin effect returns a binary image plane 1198681015840 with potential edgesthat are more horizontal than vertical It is possible to reversethis effect to detect other edges of interest such as ceilinglines or door frames At this point edges will disintegratethe more vertical they get (see Figure 3 for an illustration)Application of the Hough Transform to 1198681015840 will return allpossible lines automatically excluding discrete point sets outof which it is possible to sort out lines with a finite slope 120601 = 0

and curvature 120581 = 0 This is a significantly expensive oper-ation (ie considering the limited computational resourcesof an MAV) to perform on a real-time video feed since thetransform has to run over the entire frame including theredundant parts

To improve the overall performance in terms of efficiencywe have investigated replacing Hough Transform with analgorithm that only runs on parts of 1198681015840 that contain dataThis approach begins by dividing 1198681015840 into square blocks 119861

119909119910

Optimal block size is the smallest block that can still capturethe texture elements in 1198681015840 Camera resolution and filteringmethods used to obtain 1198681015840 affect the resulting texture elementstructure The blocks are sorted to bring the highest numberof data points with the lowest entropy (2) first as this is ablock most likely to contain lines Blocks that are empty orhave a few scattered points in them are excluded from furtheranalysis Entropy is the characteristic of an image patch thatmakes it more ambiguous by means of disorder in a closedsystem This assumes that disorder is more probable thanorder and thereby lower disorder has higher likelihood ofcontaining an architectural feature such as a line Entropy canbe expressed as

minussum

119909119910

119861119909119910

log119861119909119910 (2)

The set of candidate blocks resulting at this point are to besearched for lines Although a block 119861

119899is a binary matrix it

can be thought as a coordinate system which contains a set ofpoints (ie pixels) with (119909 119910) coordinates such that positive119909is right and positive 119910 is down Since we are more interestedin lines that are more horizontal than vertical it is safe toassume that the errors in the 119910 values outweigh those in the 119909values Equation for a ground line is in the form 119910 = 119898119909 + 119887and the deviations of data points in the block from this line are119889119894= 119910119894minus(119898119909

119894+119887)Therefore themost likely line is the one that

is composed of data points that minimize the deviation suchthat 1198892

119894= (119910119894minus 119898119909119894minus 119887)2 Using determinants the deviation

can be obtained as in (3)

119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (1199092

119894) sum119909

119894

sum119909119894

119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119898 times 119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (119909119894sdot 119910119894) sum119909

119894

sum119910119894

119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119887 times 119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (1199092

119894) sum (119909

119894sdot 119910119894)

sum119909119894

sum119910119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

(3)

Since our rangemeasurementmethods depend on these linesthe overall line-slope accuracy is affected by the reliabilityin detecting and measuring the hallway lines (or road linessidewalk lines depending on context) The high measure-ment noise in slopes has adverse effects on SLAM and shouldbe minimized to prevent inflating the uncertainty in 119871

1=

tan1206011and 119871

2= tan120601

2or the infinity point (119875

119909 119875119910) To

reduce this noise lines are cross-validated for the longestcollinearity via pixel neighborhood based line extraction inwhich the results obtained rely only on a local analysis Theircoherence is further improved using a postprocessing stepvia exploiting the texture gradient With an assumption ofthe orthogonality of the environment lines from the groundedges are extracted Note that this is also applicable to ceilinglines Although ground lines (and ceiling lines if applicable)are virtually parallel in the real world on the image plane theyintersectThe horizontal coordinate of this intersection pointis later used as a heading guide for the MAV as illustrated inFigure 5 Features that happen to coincide with these lines arepotential landmark candidates When this step is complete aset of features cross-validated with the perspective lines Ψ1015840which is a subset of Ψ with the nonuseful features removedis passed to the third step

23 Landmark Extraction Step III Range Measurement bythe Infinity-Point Method This step accurately measures theabsolute distance to features inΨ1015840 by integrating local patchesof the ground information into a global surface referenceframe This new method significantly differs from opticalflows in that the depth measurement does not require a suc-cessive history of images

Our strategy here assumes that the height of the camerafrom the ground 119867 is known a priori (see Figure 1) MAVprovides real-time altitude information to the camera Wealso assume that the camera is initially pointed at the generaldirection of the far end of the corridor This later assumptionis not a requirement if the camera is pointed at a wall thesystem will switch to visual steering mode and attempt torecover camera path withoutmapping until hallway structurebecomes available

The camera is tilted down (or up depending on pref-erence) with an angle 120573 to facilitate continuous capture offeaturemovement across perspective linesThe infinity point(119875119909 119875119910) is an imaginary concept where the projections of

the two parallel perspective lines appear to intersect onthe image plane Since this intersection point is in theoryinfinitely far from the camera it should present no parallax inresponse to the translations of the camera It does howevereffectively represent the yaw and the pitch of the camera(note the crosshair in Figure 5) Assume that the end pointsof the perspective lines are 119864

1198671= (119897 119889 minus119867)

119879 and 1198641198672

=

(119897 119889 minus 119908 minus119867)119879 where 119897 is length and 119908 is the width of the

hallway 119889 is the horizontal displacement of the camera fromthe left wall and 119867 is the MAV altitude (see Figure 4 fora visual description) The Euler rotation matrix to convert

Journal of Electrical and Computer Engineering 5

Figure 3 Initial stages after filtering for line extraction in which the line segments are being formed Note that the horizontal lines acrossthe image denote the artificial horizon for the MAV these are not architectural detections but the on-screen display provided by the MAVThis procedure is robust to transient disturbances such as people walking by or trees occluding the architecture

from the camera frame to the hallway frame is given in(4)

119860 =[[

[

119888120595119888120573 119888120573119904120595 minus119904120573

119888120595119904120601119904120573 minus 119888120601119904120595 119888120601119888120595 + 119904120601119904120595119904120573 119888120573119904120601

119904120601119904120595 + 119888120601119888120595119904120573 119888120601119904120595119904120573 minus 119888120595119904120601 119888120601119888120573

]]

]

(4)

where 119888 and 119904 are abbreviations for cos and sin functionsrespectively The vehicle yaw angle is denoted by 120595 thepitch by 120573 and the roll by 120601 Since the roll angle is con-trolled by the onboard autopilot system it can be set to bezero

The points 1198641198671

and 1198641198672

are transformed into the cameraframe via multiplication with the transpose of 119860 in (4)

1198641198621= 119860119879sdot (119897 119889 minus119867)

119879 119864

1198622= 119860119879sdot (119897 119889 minus 119908 minus119867)

119879

(5)

This 3D system is then transformed into the 2D image planevia

119906 =119910119891

119909 V =

119911119891

119909 (6)

where 119906 is the pixel horizontal position from center (right ispositive) V is the pixel vertical position from center (up ispositive) and 119891 is the focal length (37mm for the particularcamera we have used)The end points of the perspective lineshave now transformed from 119864

1198671and 119864

1198672to (119875119909

1 1198751199101)119879 and

(1198751199092 1198751199102)119879 respectively An infinitely long hallway can be

represented by

lim119897rarrinfin

1198751199091= lim119897rarrinfin

1198751199092= 119891 tan120595

lim119897rarrinfin

1198751199101= lim119897rarrinfin

1198751199102= minus

119891 tan120573cos120595

(7)

which is conceptually the same as extending the perspectivelines to infinity The fact that 119875119909

1= 119875119909

2and 119875119910

1= 119875119910

2

indicates that the intersection of the lines in the image planeis the end of such an infinitely long hallway Solving theresulting equations for 120595 and 120573 yields the camera yaw andpitch respectively

120595 = tanminus1 (119875119909

119891) 120573 = minustanminus1 (

119875119910cos120595119891

) (8)

A generic form of the transformation from the pixel position(119906 V) to (119909 119910 119911) can be derived in a similar fashion [3]The equations for 119906 and V also provide general coordinatesin the camera frame as (119911

119888119891V 119906119911

119888V 119911119888) where 119911

119888is the 119911

position of the object in the camera frame Multiplying with(4) transforms the hallway frame coordinates (119909 119910 119911) intofunctions of 119906 V and 119911

119888 Solving the new 119911 equation for 119911

119888

and substituting into the equations for 119909 and 119910 yields

119909 = ((11988612119906 + 11988613V + 11988611119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

119910 = ((11988622119906 + 11988623V + 11988621119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

(9)

where 119886119894119895denotes the elements of the matrix in (4) See

Figure 1 for the descriptions of 119909 and 119910For objects likely to be on the floor the height of the

camera above the ground is the 119911 position of the object Alsoif the platform roll can be measured or assumed negligiblethen the combination of the infinity point with the heightcan be used to obtain the range to any object on the floorof the hallway This same concept applies to objects whichare likely to be on the same wall or the ceiling By exploitingthe geometry of the corners present in the corridor our

6 Journal of Electrical and Computer Engineering

119908

1198641198671 = [119897 119889 minus119867]

1198641198621 = 119860119879 middot [119897 119889 minus119867]

119897119889120601

119867

120573

120595

(0 0 0)

1198641198672 = [119897 119889 minus 119908 minus119867]

1198641198622 = 119860119879 middot [119897 119889 minus 119908 minus119867]

Figure 4 A visual description the environment as perceived by theinfinity-point method

method computes the absolute range and bearing of thefeatures effectively turning them into landmarks needed forthe SLAM formulation See Figure 5which illustrates the finalappearance of the ranging algorithm

The graph in Figure 6 illustrates the disagreement bet-ween the line-perspectives and the infinity-point method(Section 23) in an experiment in which both algorithms exe-cuted simultaneously on the same video feedWith the partic-ular camera we used in the experiments (Logitech C905) theinfinity-point method yielded a 93 accuracy These num-bers are functions of camera resolution camera noise and theconsequent line extraction noise Therefore disagreementsnot exceeding 05 meters are in the favor of it with respectto accuracy Disagreements from the ground truth includeall transient measurement errors such as camera shake oroccasional introduction of moving objects that deceptivelymimic the environment and other anomaliesThe divergencebetween the two ranges that is visible between samples 20and 40 in Figure 6 is caused by a hallway line anomaly fromthe line extraction process independent of ranging In thisparticular case both the hallway lines have shifted causingthe infinity point to move left Horizontal translations of theinfinity point have a minimal effect on the measurementperformance of the infinity-point method being one of itsmain advantages Refer to Figure 7 for the demonstrationof the performance of these algorithms in a wide variety ofenvironments

The bias between the two measurements shown inFigure 6 is due to shifts in camera calibration parameters inbetween different experiments Certain environmental fac-tors have dramatic effects on lens precision such as accelera-tion corrosive atmosphere acoustic noise fluid contamina-tion low pressure vibration ballistic shock electromagneticradiation temperature and humidity Most of those condi-tions readily occur on an MAV (and most other platformsincluding human body) due to parts rotating at high speedspowerful air currents static electricity radio interferenceand so on Autocalibration concept is wide and beyond

the scope of this paper We present a novel mathematicalprocedure that addresses the issue of maintaining monocularcamera calibration automatically in hostile environments inanother paper of ours and we encourage the reader to refer toit [22]

3 Helix Bearing Algorithm

When the MAV approaches a turn an exit a T-section ora dead-end both ground lines tend to disappear simul-taneously Consequently range and heading measurementmethods cease to function A set of features might still bedetected and theMAV canmake a confident estimate of theirspatial pose However in the absence of depth informationa one-dimensional probability density over the depth isrepresented by a two-dimensional particle distribution

In this section we propose a turn-sensing algorithm toestimate120595 in the absence of orthogonality cuesThis situationautomatically triggers the turn-explorationmode in theMAVA yaw rotation of the body frame is initiated until anotherpassage is found The challenge is to estimate 120595 accuratelyenough to update the SLAM map correctly This proce-dure combines machine vision with the data matching anddynamic estimation problem For instance if the MAVapproaches a left-turn after exploring one leg of an ldquoLrdquo shapedhallway turns left 90 degrees and continues through the nextleg the map is expected to display two hallways joined at a90-degree angle Similarly a 180-degree turn before findinganother hallway would indicate a dead end This way theMAV can also determine where turns are located the nexttime they are visited

The newmeasurement problem at turns is to compute theinstantaneous velocity (119906 V) of every helix (moving feature)that the MAV is able to detect as shown in Figure 9 Inother words an attempt is made to recover 119881(119909 119910 119905) =

(119906(119909 119910 119905) (V(119909 119910 119905)) = (119889119909119889119905 119889119910119889119905) using a variation ofthe pyramidal Lucas-Kanade method This recovery leads toa 2D vector field obtained via perspective projection of the3D velocity field onto the image plane At discrete time stepsthe next frame is defined as a function of a previous frame as119868119905+1(119909 119910 119911 119905) = 119868

119905(119909 + 119889119909 119910 + 119889119910 119911 + 119889119911 119905 + 119889119905) By applying

the Taylor series expansion

119868 (119909 119910 119911 119905) +120597119868

120597119909120575119909 +

120597119868

120597119910120575119910 +

120597119868

120597119911120575119911 +

120597119868

120597119905120575119905 (10)

then by differentiating with respect to time yields the helixvelocity is obtained in terms of pixel distance per time step 119896

At this point each helix is assumed to be identicallydistributed and independently positioned on the image planeAnd each helix is associated with a velocity vector 119881

119894=

(V 120593)119879 where 120593 is the angular displacement of velocitydirection from the north of the image plane where 1205872 iseast 120587 is south and 31205872 is west Although the associateddepths of the helix set appearing at stochastic points on theimage plane are unknown assuming a constant there is arelationship between distance of a helix from the camera andits instantaneous velocity on the image plane This suggeststhat a helix cluster with respect to closeness of individual

Journal of Electrical and Computer Engineering 7

(1) Start from level 119871(0) = 0 and sequence119898 = 0(2) Find 119889 = min(ℎ

119886minus ℎ119887) in119872 where ℎ

119886= ℎ119887

(3) 119898 = 119898 + 1 Ψ101584010158401015840(119896) = merge([ℎ119886 ℎ119887]) 119871(119898) = 119889

(4) Delete from 119872 rows and columns corresponding to Ψ101584010158401015840(119896)(5) Add to 119872 a row and a column representing Ψ101584010158401015840(119896)(6) if (forallℎ

119894isin Ψ101584010158401015840(119896)) stop

(7) else go to (2)

Algorithm 1 Disjoint cluster identification from heat MAP119872

Figure 5 On-the-fly range measurements Note the crosshair indicating the algorithm is currently using the infinity point for heading

Sample number

Rang

e (m

)

0 20 40 60 80 100 120 140

858

757

656

Infinity point method

(a)

minus05

minus1

minus15

Sample number

Diff

eren

ce (m

)

0 20 40 60 80 100 120 140

050

(b)

Figure 6 (a) Illustrates the accuracy of the two-rangemeasurementmethodswith respect to ground truth (flat line) (b) Residuals for thetop figure

instantaneous velocities is likely to belong on the surface ofone planar object such as a door frame Let a helix with adirectional velocity be the triple ℎ

119894= (119881119894 119906119894 V119894)119879where (119906

119894 V119894)

represents the position of this particle on the image plane Atany given time (119896) let Ψ be a set containing all these featureson the image plane such that Ψ(119896) = ℎ

1 ℎ2 ℎ

119899 The 119911

component of velocity as obtained in (10) is the determining

factor for 120593 Since we are most interested in the set of helix inwhich this component is minimized Ψ(119896) is resampled suchthat

Ψ1015840(119896) = forallℎ

119894 120593 asymp

120587

2 cup 120593 asymp

3120587

2 (11)

sorted in increasing velocity order Ψ1015840(119896) is then processedthrough histogram sorting to reveal the modal helix set suchthat

Ψ10158401015840(119896) = max

if (ℎ119894= ℎ119894+1)

119899

sum

119894=0

119894

else 0

(12)

Ψ10158401015840(119896) is likely to contain clusters that tend to be distributed

with respect to objects in the scene whereas the rest of theinitial helix set fromΨ(119896)may not fit this model An agglom-erative hierarchical tree 119879 is used to identify the clustersTo construct the tree Ψ10158401015840(119896) is heat mapped represented asa symmetric matrix 119872 with respect to Manhattan distancebetween each individual helixes

119872 =[[

[

ℎ0minus ℎ0sdot sdot sdot ℎ0minus ℎ119899

d

ℎ119899minus ℎ0sdot sdot sdot ℎ119899minus ℎ119899

]]

]

(13)

The algorithm to construct the tree from 119872 is given inAlgorithm 1

The tree should be cut at the sequence119898 such that119898 + 1does not provide significant benefit in terms of modeling

8 Journal of Electrical and Computer Engineering

Figure 7 While we emphasize hallway like indoor environments our range measurement strategy is compatible with a variety of otherenvironments including outdoors office environments ceilings sidewalks and building sides where orthogonality in architecture is presentA minimum of one perspective line and one feature intersection is sufficient

the clusters After this step the set of velocities in Ψ101584010158401015840(119896)represent the largest planar object in the field of view withthe most consistent rate of pixel displacement in time Thesystem is updated such that Ψ(119896 + 1) = Ψ(119896) + 120583(Ψ101584010158401015840(119896)) asthe best effort estimate as shown in Figure 8

It is a future goal to improve the accuracy of this algo-rithm by exploiting known properties of typical objects Forinstance single doors are typically a meter-wide It is trivialto build an internal object database with templates for typicalconsistent objects found indoors If such an object of interestcould be identified by an arbitrary object detection algorithmand that world object of known dimensions dim = (119909 119910)

119879and a cluster Ψ101584010158401015840(119896) may sufficiently coincide cluster depthcan be measured via dim(119891dim1015840) where dim is the actualobject dimensions 119891 is the focal length and dim1015840 representsobject dimensions on image plane

4 SLAM Formulation

Our previous experiments [16 17] showed that due to thehighly nonlinear nature of the observation equations tra-ditional nonlinear observers such as EKF do not scale toSLAM in larger environments containing a vast number ofpotential landmarks Measurement updates in EKF requirequadratic time complexity due to the covariance matrixrendering the data association increasingly difficult as the

0 20 40 60 80 100 120 140 160 180 20080859095

100

Figure 8 This graph illustrates the accuracy of the Helix bearingalgorithm estimating 200 samples of perfect 95 degree turns (cali-brated with a digital protractor) performed at various locations withincreasing clutter at random angular rates not exceeding 1 radian-per-second in the absence of known objects

map grows AnMAVwith limited computational resources isparticularly impacted from this complexity behavior SLAMutilizing Rao-Blackwellized particle filter similar to [23]is a dynamic Bayesian approach to SLAM exploiting theconditional independence of measurements A random set ofparticles is generated using the noise model and dynamics ofthe vehicle in which each particle is considered a potentiallocation for the vehicle A reduced Kalman filter per particleis then associated with each of the current measurementsConsidering the limited computational resources of anMAVmaintaining a set of landmarks large enough to allow foraccurate motion estimations yet sparse enough so as not toproduce a negative impact on the system performance isimperativeThe noise model of the measurements along with

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

2 Journal of Electrical and Computer Engineering

Right hallway line

Range B

Range A

120573119867

120595

119897

119903119906119903

119906119897119910 119911

119909∙

∙120579

Figure 1 A three-dimensional representation of the corridor with respect to the MAV Note that the width of the hallway is not provided tothe algorithm and the MAV does not have any sensors that can detect walls

as monoSLAM [10 11] which depend on movement fordepth estimation and offer a relative recovered scale may notprovide reliable object avoidance for an agile MAV in anindoor environment A rotorcraft MAV needs to bank tomove the camera sideways a movement severely limited in ahallway for helicopter dynamics it has to be able to performdepth measurement from a still or nearly-still platform

In SLAM Extended Kalman Filter based approaches withfull covariance have a limitation for the size of a manageablemap in real time considering the quadratic nature of thealgorithm versus computational resources of anMAV Globallocalization techniques such as Condensation SLAM [12]require a full map to be provided to the robot a-prioriAzimuth learning based techniques such as Cognitive SLAM[13] are parametric and locations are centered on the robotwhich naturally becomes incompatible with ambiguous land-marksmdashsuch as the landmarks our MAV has to work withImage registration basedmethods such as [14] propose a dif-ferent formulation of the vision-based SLAM problem basedon motion structure and illumination parameters withoutfirst having to find feature correspondences For a real-timeimplementation however a local optimization procedure isrequired and there is a possibility of getting trapped ina local minimum Further without merging regions witha similar structure the method becomes computationallyintensive for an MAV Structure extraction methods [15]have some limitations since an incorrect incorporation ofpoints into higher level features will have an adverse effecton consistency Further these systems depend on a successfulselection of thresholds

12 Comparison with Prior Work and Organization Thispaper addresses the above shortcomings using an unmodifiedconsumer-grade monocular web camera By exploiting thearchitectural orthogonality of the indoor and urban outdoorenvironments we introduce a novel method for monocularvision-based SLAMby computing absolute range and bearinginformationwithout using active ranging sensorsMore thor-ough algorithm formulations and newer experimental resultswith a unique indoor-flying helicopter are discussed in thispaper than in our prior conference articles [16ndash19] Section 2explains the procedures for perception of world geometryas pre-requisites for SLAM such as range measurement

methods as well as performance evaluations of proposedmethodsWhile a visual turn-sensing algorithm is introducedin Section 3 SLAM formulations are provided in Section 4Results of experimental validation as well as a descriptionof the MAV hardware platform are presented in Section 5Figure 2 can be used as a guide to sections as well as to theprocess flow of our proposed method

2 Problem and Algorithm Formulation

We propose a novel method to estimate the absolute depth offeatures using a monocular camera as a sole means of navi-gation The camera is mounted on the platform with a slightdownward tilt Landmarks are assumed to be stationaryMov-ing targets are also detected however they are not consideredas landmarks and therefore ignored by the map Altitude ismeasured in real time via the on-board ultrasonic altimeteron our MAV or in the case of a ground robot it can beprovided to the system via various methods depending onwhere the camera is installed It is acceptable that the cameratranslates or tilts with respect to the robot such as mountedon a robotic arm as long as the mount is properly encoded toindicate altitude We validate our results with a time-varyingaltitude The ground is assumed to be relatively flat (no morethan 5 degrees of inclination within a 10-meter perimeter)Our algorithmhas capability to adapt to inclines if the cameratilt can be controlled we have equipped some of our testplatforms with this capability

21 Landmark Extraction Step I Feature Extraction A land-mark in the SLAM context is a conspicuous distinguishinglandscape feature marking a location A minimal landmarkcan consist of two measurements with respect to robot posi-tion range and bearing Our landmark extraction strategy isa three step automatic process All three-steps are performedon a frame 119868

119905 before moving onto the next frame 119868

119905+ 1

The first step involves finding prominent parts of 119868119905that

tend to be more attractive than other parts in terms oftexture dissimilarity and convergence These parts tend tobe immune to rotation scale illumination and image noiseandwe refer to them as features which have the form119891

119899(119906 V)

We utilize two algorithms for this procedure For flying

Journal of Electrical and Computer Engineering 3

Navigation computer

Communications

Mission controlMission planning

Battery Manual override Yaw gyroscope Flight surfaces Power plant

RAM

Inertial measurement unit

USB20

Orthogonalitypresent

SLAMHelix bearing algorithm

VGA video

Compass

Mass storage

Fuselage

Autopilot

GPS

Altimeter Airspeed

Manual overrideWireless RS232Wireless RS232

Custom Linux kernel

IEEE 80211

acquisition and edge filteringLandmark extraction

Range-bearing measurements

New heading

NoYes

Landmarks

Range bearingH

eading

Line-slope extraction

Hallw

ay lines

2 MP monocular image

Figure 2 Block diagram illustrating the operational steps of the monocular vision navigation and ranging at high level and its relations withthe flight systems The scheme is directly applicable to other mobile platforms

platforms considering the limited computational resourcesavailable we prefer the the algorithm proposed by Shi andTomasi [20] in which sections of 119868 with large eigenvaluesare extracted into a set Ψ such that Ψ = 119891

1 1198912 119891

119899

Although there is virtually no limit for 119899 it is impossible atthis time in the procedure to make an educated distinctionbetween a useless feature for the map (ie one that cannotbe used for ranging and bearing) and a potential landmark(ie one that provides reliable range and bearing informationand thus can be included in the map) For ground basedplatforms we prefer the SURF algorithm (Figure 3) due tothe directionality its detected features offer [21] Directionalfeatures are particularly useful where the platform dynamicsare diverse such as human body or MAV applications ingusty environments directional features are more robust interms of associating them with architectural lines whereinstead of a single distance threshold the direction of featureitself also becomes a metric It is also useful when ceilings areused where lines are usually segmented and more difficult todetect This being an expensive algorithm we consider fasterimplementations such as ASURF

In following steps we describe how to extract a sparse setof reliable landmarks from a populated set of questionablefeatures

22 Landmark Extraction Step II Line and Slope ExtractionConceptually landmarks exist in the 3D inertial frame andthey are distinctive whereas features in Ψ = 119891

1 1198912 119891

119899

exist on a 2D image plane and they contain ambiguity Inother words our knowledge of their range and bearing infor-mation with respect to the camera is uniformly distributedacross 119868

119905 Considering the limited mobility of our platform

in the particular environment parallax among the features isvery limited Thus we attempt to correlate the contents of Ψwith the real world via their relationship with the perspectivelines

On a well-lit well-contrasting noncluttered hallway per-spective lines are obvious Practical hallways have randomobjects that segment or even falsely mimic these lines More-over on a monocular camera objects are aliased with dis-tance making it more difficult to find consistent ends of per-spective lines as they tend to be considerably far from thecamera For these reasons the construction of those linesshould be an adaptive approach

We begin the adaptive procedure by edge filtering theimage 119868 through a discrete differentiation operator withmore weight on the horizontal convolution such as

1198681015840

119909= 119865ℎlowast 119868 119868

1015840

119910= 119865V lowast 119868 (1)

4 Journal of Electrical and Computer Engineering

where lowast denotes the convolution operator and 119865 is a 3 times 3kernel for horizontal and vertical derivative approximations1198681015840

119909and 1198681015840119910are combined with weights whose ratio determines

the range of angles through which edges will be filtered Thisin effect returns a binary image plane 1198681015840 with potential edgesthat are more horizontal than vertical It is possible to reversethis effect to detect other edges of interest such as ceilinglines or door frames At this point edges will disintegratethe more vertical they get (see Figure 3 for an illustration)Application of the Hough Transform to 1198681015840 will return allpossible lines automatically excluding discrete point sets outof which it is possible to sort out lines with a finite slope 120601 = 0

and curvature 120581 = 0 This is a significantly expensive oper-ation (ie considering the limited computational resourcesof an MAV) to perform on a real-time video feed since thetransform has to run over the entire frame including theredundant parts

To improve the overall performance in terms of efficiencywe have investigated replacing Hough Transform with analgorithm that only runs on parts of 1198681015840 that contain dataThis approach begins by dividing 1198681015840 into square blocks 119861

119909119910

Optimal block size is the smallest block that can still capturethe texture elements in 1198681015840 Camera resolution and filteringmethods used to obtain 1198681015840 affect the resulting texture elementstructure The blocks are sorted to bring the highest numberof data points with the lowest entropy (2) first as this is ablock most likely to contain lines Blocks that are empty orhave a few scattered points in them are excluded from furtheranalysis Entropy is the characteristic of an image patch thatmakes it more ambiguous by means of disorder in a closedsystem This assumes that disorder is more probable thanorder and thereby lower disorder has higher likelihood ofcontaining an architectural feature such as a line Entropy canbe expressed as

minussum

119909119910

119861119909119910

log119861119909119910 (2)

The set of candidate blocks resulting at this point are to besearched for lines Although a block 119861

119899is a binary matrix it

can be thought as a coordinate system which contains a set ofpoints (ie pixels) with (119909 119910) coordinates such that positive119909is right and positive 119910 is down Since we are more interestedin lines that are more horizontal than vertical it is safe toassume that the errors in the 119910 values outweigh those in the 119909values Equation for a ground line is in the form 119910 = 119898119909 + 119887and the deviations of data points in the block from this line are119889119894= 119910119894minus(119898119909

119894+119887)Therefore themost likely line is the one that

is composed of data points that minimize the deviation suchthat 1198892

119894= (119910119894minus 119898119909119894minus 119887)2 Using determinants the deviation

can be obtained as in (3)

119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (1199092

119894) sum119909

119894

sum119909119894

119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119898 times 119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (119909119894sdot 119910119894) sum119909

119894

sum119910119894

119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119887 times 119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (1199092

119894) sum (119909

119894sdot 119910119894)

sum119909119894

sum119910119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

(3)

Since our rangemeasurementmethods depend on these linesthe overall line-slope accuracy is affected by the reliabilityin detecting and measuring the hallway lines (or road linessidewalk lines depending on context) The high measure-ment noise in slopes has adverse effects on SLAM and shouldbe minimized to prevent inflating the uncertainty in 119871

1=

tan1206011and 119871

2= tan120601

2or the infinity point (119875

119909 119875119910) To

reduce this noise lines are cross-validated for the longestcollinearity via pixel neighborhood based line extraction inwhich the results obtained rely only on a local analysis Theircoherence is further improved using a postprocessing stepvia exploiting the texture gradient With an assumption ofthe orthogonality of the environment lines from the groundedges are extracted Note that this is also applicable to ceilinglines Although ground lines (and ceiling lines if applicable)are virtually parallel in the real world on the image plane theyintersectThe horizontal coordinate of this intersection pointis later used as a heading guide for the MAV as illustrated inFigure 5 Features that happen to coincide with these lines arepotential landmark candidates When this step is complete aset of features cross-validated with the perspective lines Ψ1015840which is a subset of Ψ with the nonuseful features removedis passed to the third step

23 Landmark Extraction Step III Range Measurement bythe Infinity-Point Method This step accurately measures theabsolute distance to features inΨ1015840 by integrating local patchesof the ground information into a global surface referenceframe This new method significantly differs from opticalflows in that the depth measurement does not require a suc-cessive history of images

Our strategy here assumes that the height of the camerafrom the ground 119867 is known a priori (see Figure 1) MAVprovides real-time altitude information to the camera Wealso assume that the camera is initially pointed at the generaldirection of the far end of the corridor This later assumptionis not a requirement if the camera is pointed at a wall thesystem will switch to visual steering mode and attempt torecover camera path withoutmapping until hallway structurebecomes available

The camera is tilted down (or up depending on pref-erence) with an angle 120573 to facilitate continuous capture offeaturemovement across perspective linesThe infinity point(119875119909 119875119910) is an imaginary concept where the projections of

the two parallel perspective lines appear to intersect onthe image plane Since this intersection point is in theoryinfinitely far from the camera it should present no parallax inresponse to the translations of the camera It does howevereffectively represent the yaw and the pitch of the camera(note the crosshair in Figure 5) Assume that the end pointsof the perspective lines are 119864

1198671= (119897 119889 minus119867)

119879 and 1198641198672

=

(119897 119889 minus 119908 minus119867)119879 where 119897 is length and 119908 is the width of the

hallway 119889 is the horizontal displacement of the camera fromthe left wall and 119867 is the MAV altitude (see Figure 4 fora visual description) The Euler rotation matrix to convert

Journal of Electrical and Computer Engineering 5

Figure 3 Initial stages after filtering for line extraction in which the line segments are being formed Note that the horizontal lines acrossthe image denote the artificial horizon for the MAV these are not architectural detections but the on-screen display provided by the MAVThis procedure is robust to transient disturbances such as people walking by or trees occluding the architecture

from the camera frame to the hallway frame is given in(4)

119860 =[[

[

119888120595119888120573 119888120573119904120595 minus119904120573

119888120595119904120601119904120573 minus 119888120601119904120595 119888120601119888120595 + 119904120601119904120595119904120573 119888120573119904120601

119904120601119904120595 + 119888120601119888120595119904120573 119888120601119904120595119904120573 minus 119888120595119904120601 119888120601119888120573

]]

]

(4)

where 119888 and 119904 are abbreviations for cos and sin functionsrespectively The vehicle yaw angle is denoted by 120595 thepitch by 120573 and the roll by 120601 Since the roll angle is con-trolled by the onboard autopilot system it can be set to bezero

The points 1198641198671

and 1198641198672

are transformed into the cameraframe via multiplication with the transpose of 119860 in (4)

1198641198621= 119860119879sdot (119897 119889 minus119867)

119879 119864

1198622= 119860119879sdot (119897 119889 minus 119908 minus119867)

119879

(5)

This 3D system is then transformed into the 2D image planevia

119906 =119910119891

119909 V =

119911119891

119909 (6)

where 119906 is the pixel horizontal position from center (right ispositive) V is the pixel vertical position from center (up ispositive) and 119891 is the focal length (37mm for the particularcamera we have used)The end points of the perspective lineshave now transformed from 119864

1198671and 119864

1198672to (119875119909

1 1198751199101)119879 and

(1198751199092 1198751199102)119879 respectively An infinitely long hallway can be

represented by

lim119897rarrinfin

1198751199091= lim119897rarrinfin

1198751199092= 119891 tan120595

lim119897rarrinfin

1198751199101= lim119897rarrinfin

1198751199102= minus

119891 tan120573cos120595

(7)

which is conceptually the same as extending the perspectivelines to infinity The fact that 119875119909

1= 119875119909

2and 119875119910

1= 119875119910

2

indicates that the intersection of the lines in the image planeis the end of such an infinitely long hallway Solving theresulting equations for 120595 and 120573 yields the camera yaw andpitch respectively

120595 = tanminus1 (119875119909

119891) 120573 = minustanminus1 (

119875119910cos120595119891

) (8)

A generic form of the transformation from the pixel position(119906 V) to (119909 119910 119911) can be derived in a similar fashion [3]The equations for 119906 and V also provide general coordinatesin the camera frame as (119911

119888119891V 119906119911

119888V 119911119888) where 119911

119888is the 119911

position of the object in the camera frame Multiplying with(4) transforms the hallway frame coordinates (119909 119910 119911) intofunctions of 119906 V and 119911

119888 Solving the new 119911 equation for 119911

119888

and substituting into the equations for 119909 and 119910 yields

119909 = ((11988612119906 + 11988613V + 11988611119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

119910 = ((11988622119906 + 11988623V + 11988621119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

(9)

where 119886119894119895denotes the elements of the matrix in (4) See

Figure 1 for the descriptions of 119909 and 119910For objects likely to be on the floor the height of the

camera above the ground is the 119911 position of the object Alsoif the platform roll can be measured or assumed negligiblethen the combination of the infinity point with the heightcan be used to obtain the range to any object on the floorof the hallway This same concept applies to objects whichare likely to be on the same wall or the ceiling By exploitingthe geometry of the corners present in the corridor our

6 Journal of Electrical and Computer Engineering

119908

1198641198671 = [119897 119889 minus119867]

1198641198621 = 119860119879 middot [119897 119889 minus119867]

119897119889120601

119867

120573

120595

(0 0 0)

1198641198672 = [119897 119889 minus 119908 minus119867]

1198641198622 = 119860119879 middot [119897 119889 minus 119908 minus119867]

Figure 4 A visual description the environment as perceived by theinfinity-point method

method computes the absolute range and bearing of thefeatures effectively turning them into landmarks needed forthe SLAM formulation See Figure 5which illustrates the finalappearance of the ranging algorithm

The graph in Figure 6 illustrates the disagreement bet-ween the line-perspectives and the infinity-point method(Section 23) in an experiment in which both algorithms exe-cuted simultaneously on the same video feedWith the partic-ular camera we used in the experiments (Logitech C905) theinfinity-point method yielded a 93 accuracy These num-bers are functions of camera resolution camera noise and theconsequent line extraction noise Therefore disagreementsnot exceeding 05 meters are in the favor of it with respectto accuracy Disagreements from the ground truth includeall transient measurement errors such as camera shake oroccasional introduction of moving objects that deceptivelymimic the environment and other anomaliesThe divergencebetween the two ranges that is visible between samples 20and 40 in Figure 6 is caused by a hallway line anomaly fromthe line extraction process independent of ranging In thisparticular case both the hallway lines have shifted causingthe infinity point to move left Horizontal translations of theinfinity point have a minimal effect on the measurementperformance of the infinity-point method being one of itsmain advantages Refer to Figure 7 for the demonstrationof the performance of these algorithms in a wide variety ofenvironments

The bias between the two measurements shown inFigure 6 is due to shifts in camera calibration parameters inbetween different experiments Certain environmental fac-tors have dramatic effects on lens precision such as accelera-tion corrosive atmosphere acoustic noise fluid contamina-tion low pressure vibration ballistic shock electromagneticradiation temperature and humidity Most of those condi-tions readily occur on an MAV (and most other platformsincluding human body) due to parts rotating at high speedspowerful air currents static electricity radio interferenceand so on Autocalibration concept is wide and beyond

the scope of this paper We present a novel mathematicalprocedure that addresses the issue of maintaining monocularcamera calibration automatically in hostile environments inanother paper of ours and we encourage the reader to refer toit [22]

3 Helix Bearing Algorithm

When the MAV approaches a turn an exit a T-section ora dead-end both ground lines tend to disappear simul-taneously Consequently range and heading measurementmethods cease to function A set of features might still bedetected and theMAV canmake a confident estimate of theirspatial pose However in the absence of depth informationa one-dimensional probability density over the depth isrepresented by a two-dimensional particle distribution

In this section we propose a turn-sensing algorithm toestimate120595 in the absence of orthogonality cuesThis situationautomatically triggers the turn-explorationmode in theMAVA yaw rotation of the body frame is initiated until anotherpassage is found The challenge is to estimate 120595 accuratelyenough to update the SLAM map correctly This proce-dure combines machine vision with the data matching anddynamic estimation problem For instance if the MAVapproaches a left-turn after exploring one leg of an ldquoLrdquo shapedhallway turns left 90 degrees and continues through the nextleg the map is expected to display two hallways joined at a90-degree angle Similarly a 180-degree turn before findinganother hallway would indicate a dead end This way theMAV can also determine where turns are located the nexttime they are visited

The newmeasurement problem at turns is to compute theinstantaneous velocity (119906 V) of every helix (moving feature)that the MAV is able to detect as shown in Figure 9 Inother words an attempt is made to recover 119881(119909 119910 119905) =

(119906(119909 119910 119905) (V(119909 119910 119905)) = (119889119909119889119905 119889119910119889119905) using a variation ofthe pyramidal Lucas-Kanade method This recovery leads toa 2D vector field obtained via perspective projection of the3D velocity field onto the image plane At discrete time stepsthe next frame is defined as a function of a previous frame as119868119905+1(119909 119910 119911 119905) = 119868

119905(119909 + 119889119909 119910 + 119889119910 119911 + 119889119911 119905 + 119889119905) By applying

the Taylor series expansion

119868 (119909 119910 119911 119905) +120597119868

120597119909120575119909 +

120597119868

120597119910120575119910 +

120597119868

120597119911120575119911 +

120597119868

120597119905120575119905 (10)

then by differentiating with respect to time yields the helixvelocity is obtained in terms of pixel distance per time step 119896

At this point each helix is assumed to be identicallydistributed and independently positioned on the image planeAnd each helix is associated with a velocity vector 119881

119894=

(V 120593)119879 where 120593 is the angular displacement of velocitydirection from the north of the image plane where 1205872 iseast 120587 is south and 31205872 is west Although the associateddepths of the helix set appearing at stochastic points on theimage plane are unknown assuming a constant there is arelationship between distance of a helix from the camera andits instantaneous velocity on the image plane This suggeststhat a helix cluster with respect to closeness of individual

Journal of Electrical and Computer Engineering 7

(1) Start from level 119871(0) = 0 and sequence119898 = 0(2) Find 119889 = min(ℎ

119886minus ℎ119887) in119872 where ℎ

119886= ℎ119887

(3) 119898 = 119898 + 1 Ψ101584010158401015840(119896) = merge([ℎ119886 ℎ119887]) 119871(119898) = 119889

(4) Delete from 119872 rows and columns corresponding to Ψ101584010158401015840(119896)(5) Add to 119872 a row and a column representing Ψ101584010158401015840(119896)(6) if (forallℎ

119894isin Ψ101584010158401015840(119896)) stop

(7) else go to (2)

Algorithm 1 Disjoint cluster identification from heat MAP119872

Figure 5 On-the-fly range measurements Note the crosshair indicating the algorithm is currently using the infinity point for heading

Sample number

Rang

e (m

)

0 20 40 60 80 100 120 140

858

757

656

Infinity point method

(a)

minus05

minus1

minus15

Sample number

Diff

eren

ce (m

)

0 20 40 60 80 100 120 140

050

(b)

Figure 6 (a) Illustrates the accuracy of the two-rangemeasurementmethodswith respect to ground truth (flat line) (b) Residuals for thetop figure

instantaneous velocities is likely to belong on the surface ofone planar object such as a door frame Let a helix with adirectional velocity be the triple ℎ

119894= (119881119894 119906119894 V119894)119879where (119906

119894 V119894)

represents the position of this particle on the image plane Atany given time (119896) let Ψ be a set containing all these featureson the image plane such that Ψ(119896) = ℎ

1 ℎ2 ℎ

119899 The 119911

component of velocity as obtained in (10) is the determining

factor for 120593 Since we are most interested in the set of helix inwhich this component is minimized Ψ(119896) is resampled suchthat

Ψ1015840(119896) = forallℎ

119894 120593 asymp

120587

2 cup 120593 asymp

3120587

2 (11)

sorted in increasing velocity order Ψ1015840(119896) is then processedthrough histogram sorting to reveal the modal helix set suchthat

Ψ10158401015840(119896) = max

if (ℎ119894= ℎ119894+1)

119899

sum

119894=0

119894

else 0

(12)

Ψ10158401015840(119896) is likely to contain clusters that tend to be distributed

with respect to objects in the scene whereas the rest of theinitial helix set fromΨ(119896)may not fit this model An agglom-erative hierarchical tree 119879 is used to identify the clustersTo construct the tree Ψ10158401015840(119896) is heat mapped represented asa symmetric matrix 119872 with respect to Manhattan distancebetween each individual helixes

119872 =[[

[

ℎ0minus ℎ0sdot sdot sdot ℎ0minus ℎ119899

d

ℎ119899minus ℎ0sdot sdot sdot ℎ119899minus ℎ119899

]]

]

(13)

The algorithm to construct the tree from 119872 is given inAlgorithm 1

The tree should be cut at the sequence119898 such that119898 + 1does not provide significant benefit in terms of modeling

8 Journal of Electrical and Computer Engineering

Figure 7 While we emphasize hallway like indoor environments our range measurement strategy is compatible with a variety of otherenvironments including outdoors office environments ceilings sidewalks and building sides where orthogonality in architecture is presentA minimum of one perspective line and one feature intersection is sufficient

the clusters After this step the set of velocities in Ψ101584010158401015840(119896)represent the largest planar object in the field of view withthe most consistent rate of pixel displacement in time Thesystem is updated such that Ψ(119896 + 1) = Ψ(119896) + 120583(Ψ101584010158401015840(119896)) asthe best effort estimate as shown in Figure 8

It is a future goal to improve the accuracy of this algo-rithm by exploiting known properties of typical objects Forinstance single doors are typically a meter-wide It is trivialto build an internal object database with templates for typicalconsistent objects found indoors If such an object of interestcould be identified by an arbitrary object detection algorithmand that world object of known dimensions dim = (119909 119910)

119879and a cluster Ψ101584010158401015840(119896) may sufficiently coincide cluster depthcan be measured via dim(119891dim1015840) where dim is the actualobject dimensions 119891 is the focal length and dim1015840 representsobject dimensions on image plane

4 SLAM Formulation

Our previous experiments [16 17] showed that due to thehighly nonlinear nature of the observation equations tra-ditional nonlinear observers such as EKF do not scale toSLAM in larger environments containing a vast number ofpotential landmarks Measurement updates in EKF requirequadratic time complexity due to the covariance matrixrendering the data association increasingly difficult as the

0 20 40 60 80 100 120 140 160 180 20080859095

100

Figure 8 This graph illustrates the accuracy of the Helix bearingalgorithm estimating 200 samples of perfect 95 degree turns (cali-brated with a digital protractor) performed at various locations withincreasing clutter at random angular rates not exceeding 1 radian-per-second in the absence of known objects

map grows AnMAVwith limited computational resources isparticularly impacted from this complexity behavior SLAMutilizing Rao-Blackwellized particle filter similar to [23]is a dynamic Bayesian approach to SLAM exploiting theconditional independence of measurements A random set ofparticles is generated using the noise model and dynamics ofthe vehicle in which each particle is considered a potentiallocation for the vehicle A reduced Kalman filter per particleis then associated with each of the current measurementsConsidering the limited computational resources of anMAVmaintaining a set of landmarks large enough to allow foraccurate motion estimations yet sparse enough so as not toproduce a negative impact on the system performance isimperativeThe noise model of the measurements along with

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

Journal of Electrical and Computer Engineering 3

Navigation computer

Communications

Mission controlMission planning

Battery Manual override Yaw gyroscope Flight surfaces Power plant

RAM

Inertial measurement unit

USB20

Orthogonalitypresent

SLAMHelix bearing algorithm

VGA video

Compass

Mass storage

Fuselage

Autopilot

GPS

Altimeter Airspeed

Manual overrideWireless RS232Wireless RS232

Custom Linux kernel

IEEE 80211

acquisition and edge filteringLandmark extraction

Range-bearing measurements

New heading

NoYes

Landmarks

Range bearingH

eading

Line-slope extraction

Hallw

ay lines

2 MP monocular image

Figure 2 Block diagram illustrating the operational steps of the monocular vision navigation and ranging at high level and its relations withthe flight systems The scheme is directly applicable to other mobile platforms

platforms considering the limited computational resourcesavailable we prefer the the algorithm proposed by Shi andTomasi [20] in which sections of 119868 with large eigenvaluesare extracted into a set Ψ such that Ψ = 119891

1 1198912 119891

119899

Although there is virtually no limit for 119899 it is impossible atthis time in the procedure to make an educated distinctionbetween a useless feature for the map (ie one that cannotbe used for ranging and bearing) and a potential landmark(ie one that provides reliable range and bearing informationand thus can be included in the map) For ground basedplatforms we prefer the SURF algorithm (Figure 3) due tothe directionality its detected features offer [21] Directionalfeatures are particularly useful where the platform dynamicsare diverse such as human body or MAV applications ingusty environments directional features are more robust interms of associating them with architectural lines whereinstead of a single distance threshold the direction of featureitself also becomes a metric It is also useful when ceilings areused where lines are usually segmented and more difficult todetect This being an expensive algorithm we consider fasterimplementations such as ASURF

In following steps we describe how to extract a sparse setof reliable landmarks from a populated set of questionablefeatures

22 Landmark Extraction Step II Line and Slope ExtractionConceptually landmarks exist in the 3D inertial frame andthey are distinctive whereas features in Ψ = 119891

1 1198912 119891

119899

exist on a 2D image plane and they contain ambiguity Inother words our knowledge of their range and bearing infor-mation with respect to the camera is uniformly distributedacross 119868

119905 Considering the limited mobility of our platform

in the particular environment parallax among the features isvery limited Thus we attempt to correlate the contents of Ψwith the real world via their relationship with the perspectivelines

On a well-lit well-contrasting noncluttered hallway per-spective lines are obvious Practical hallways have randomobjects that segment or even falsely mimic these lines More-over on a monocular camera objects are aliased with dis-tance making it more difficult to find consistent ends of per-spective lines as they tend to be considerably far from thecamera For these reasons the construction of those linesshould be an adaptive approach

We begin the adaptive procedure by edge filtering theimage 119868 through a discrete differentiation operator withmore weight on the horizontal convolution such as

1198681015840

119909= 119865ℎlowast 119868 119868

1015840

119910= 119865V lowast 119868 (1)

4 Journal of Electrical and Computer Engineering

where lowast denotes the convolution operator and 119865 is a 3 times 3kernel for horizontal and vertical derivative approximations1198681015840

119909and 1198681015840119910are combined with weights whose ratio determines

the range of angles through which edges will be filtered Thisin effect returns a binary image plane 1198681015840 with potential edgesthat are more horizontal than vertical It is possible to reversethis effect to detect other edges of interest such as ceilinglines or door frames At this point edges will disintegratethe more vertical they get (see Figure 3 for an illustration)Application of the Hough Transform to 1198681015840 will return allpossible lines automatically excluding discrete point sets outof which it is possible to sort out lines with a finite slope 120601 = 0

and curvature 120581 = 0 This is a significantly expensive oper-ation (ie considering the limited computational resourcesof an MAV) to perform on a real-time video feed since thetransform has to run over the entire frame including theredundant parts

To improve the overall performance in terms of efficiencywe have investigated replacing Hough Transform with analgorithm that only runs on parts of 1198681015840 that contain dataThis approach begins by dividing 1198681015840 into square blocks 119861

119909119910

Optimal block size is the smallest block that can still capturethe texture elements in 1198681015840 Camera resolution and filteringmethods used to obtain 1198681015840 affect the resulting texture elementstructure The blocks are sorted to bring the highest numberof data points with the lowest entropy (2) first as this is ablock most likely to contain lines Blocks that are empty orhave a few scattered points in them are excluded from furtheranalysis Entropy is the characteristic of an image patch thatmakes it more ambiguous by means of disorder in a closedsystem This assumes that disorder is more probable thanorder and thereby lower disorder has higher likelihood ofcontaining an architectural feature such as a line Entropy canbe expressed as

minussum

119909119910

119861119909119910

log119861119909119910 (2)

The set of candidate blocks resulting at this point are to besearched for lines Although a block 119861

119899is a binary matrix it

can be thought as a coordinate system which contains a set ofpoints (ie pixels) with (119909 119910) coordinates such that positive119909is right and positive 119910 is down Since we are more interestedin lines that are more horizontal than vertical it is safe toassume that the errors in the 119910 values outweigh those in the 119909values Equation for a ground line is in the form 119910 = 119898119909 + 119887and the deviations of data points in the block from this line are119889119894= 119910119894minus(119898119909

119894+119887)Therefore themost likely line is the one that

is composed of data points that minimize the deviation suchthat 1198892

119894= (119910119894minus 119898119909119894minus 119887)2 Using determinants the deviation

can be obtained as in (3)

119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (1199092

119894) sum119909

119894

sum119909119894

119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119898 times 119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (119909119894sdot 119910119894) sum119909

119894

sum119910119894

119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119887 times 119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (1199092

119894) sum (119909

119894sdot 119910119894)

sum119909119894

sum119910119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

(3)

Since our rangemeasurementmethods depend on these linesthe overall line-slope accuracy is affected by the reliabilityin detecting and measuring the hallway lines (or road linessidewalk lines depending on context) The high measure-ment noise in slopes has adverse effects on SLAM and shouldbe minimized to prevent inflating the uncertainty in 119871

1=

tan1206011and 119871

2= tan120601

2or the infinity point (119875

119909 119875119910) To

reduce this noise lines are cross-validated for the longestcollinearity via pixel neighborhood based line extraction inwhich the results obtained rely only on a local analysis Theircoherence is further improved using a postprocessing stepvia exploiting the texture gradient With an assumption ofthe orthogonality of the environment lines from the groundedges are extracted Note that this is also applicable to ceilinglines Although ground lines (and ceiling lines if applicable)are virtually parallel in the real world on the image plane theyintersectThe horizontal coordinate of this intersection pointis later used as a heading guide for the MAV as illustrated inFigure 5 Features that happen to coincide with these lines arepotential landmark candidates When this step is complete aset of features cross-validated with the perspective lines Ψ1015840which is a subset of Ψ with the nonuseful features removedis passed to the third step

23 Landmark Extraction Step III Range Measurement bythe Infinity-Point Method This step accurately measures theabsolute distance to features inΨ1015840 by integrating local patchesof the ground information into a global surface referenceframe This new method significantly differs from opticalflows in that the depth measurement does not require a suc-cessive history of images

Our strategy here assumes that the height of the camerafrom the ground 119867 is known a priori (see Figure 1) MAVprovides real-time altitude information to the camera Wealso assume that the camera is initially pointed at the generaldirection of the far end of the corridor This later assumptionis not a requirement if the camera is pointed at a wall thesystem will switch to visual steering mode and attempt torecover camera path withoutmapping until hallway structurebecomes available

The camera is tilted down (or up depending on pref-erence) with an angle 120573 to facilitate continuous capture offeaturemovement across perspective linesThe infinity point(119875119909 119875119910) is an imaginary concept where the projections of

the two parallel perspective lines appear to intersect onthe image plane Since this intersection point is in theoryinfinitely far from the camera it should present no parallax inresponse to the translations of the camera It does howevereffectively represent the yaw and the pitch of the camera(note the crosshair in Figure 5) Assume that the end pointsof the perspective lines are 119864

1198671= (119897 119889 minus119867)

119879 and 1198641198672

=

(119897 119889 minus 119908 minus119867)119879 where 119897 is length and 119908 is the width of the

hallway 119889 is the horizontal displacement of the camera fromthe left wall and 119867 is the MAV altitude (see Figure 4 fora visual description) The Euler rotation matrix to convert

Journal of Electrical and Computer Engineering 5

Figure 3 Initial stages after filtering for line extraction in which the line segments are being formed Note that the horizontal lines acrossthe image denote the artificial horizon for the MAV these are not architectural detections but the on-screen display provided by the MAVThis procedure is robust to transient disturbances such as people walking by or trees occluding the architecture

from the camera frame to the hallway frame is given in(4)

119860 =[[

[

119888120595119888120573 119888120573119904120595 minus119904120573

119888120595119904120601119904120573 minus 119888120601119904120595 119888120601119888120595 + 119904120601119904120595119904120573 119888120573119904120601

119904120601119904120595 + 119888120601119888120595119904120573 119888120601119904120595119904120573 minus 119888120595119904120601 119888120601119888120573

]]

]

(4)

where 119888 and 119904 are abbreviations for cos and sin functionsrespectively The vehicle yaw angle is denoted by 120595 thepitch by 120573 and the roll by 120601 Since the roll angle is con-trolled by the onboard autopilot system it can be set to bezero

The points 1198641198671

and 1198641198672

are transformed into the cameraframe via multiplication with the transpose of 119860 in (4)

1198641198621= 119860119879sdot (119897 119889 minus119867)

119879 119864

1198622= 119860119879sdot (119897 119889 minus 119908 minus119867)

119879

(5)

This 3D system is then transformed into the 2D image planevia

119906 =119910119891

119909 V =

119911119891

119909 (6)

where 119906 is the pixel horizontal position from center (right ispositive) V is the pixel vertical position from center (up ispositive) and 119891 is the focal length (37mm for the particularcamera we have used)The end points of the perspective lineshave now transformed from 119864

1198671and 119864

1198672to (119875119909

1 1198751199101)119879 and

(1198751199092 1198751199102)119879 respectively An infinitely long hallway can be

represented by

lim119897rarrinfin

1198751199091= lim119897rarrinfin

1198751199092= 119891 tan120595

lim119897rarrinfin

1198751199101= lim119897rarrinfin

1198751199102= minus

119891 tan120573cos120595

(7)

which is conceptually the same as extending the perspectivelines to infinity The fact that 119875119909

1= 119875119909

2and 119875119910

1= 119875119910

2

indicates that the intersection of the lines in the image planeis the end of such an infinitely long hallway Solving theresulting equations for 120595 and 120573 yields the camera yaw andpitch respectively

120595 = tanminus1 (119875119909

119891) 120573 = minustanminus1 (

119875119910cos120595119891

) (8)

A generic form of the transformation from the pixel position(119906 V) to (119909 119910 119911) can be derived in a similar fashion [3]The equations for 119906 and V also provide general coordinatesin the camera frame as (119911

119888119891V 119906119911

119888V 119911119888) where 119911

119888is the 119911

position of the object in the camera frame Multiplying with(4) transforms the hallway frame coordinates (119909 119910 119911) intofunctions of 119906 V and 119911

119888 Solving the new 119911 equation for 119911

119888

and substituting into the equations for 119909 and 119910 yields

119909 = ((11988612119906 + 11988613V + 11988611119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

119910 = ((11988622119906 + 11988623V + 11988621119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

(9)

where 119886119894119895denotes the elements of the matrix in (4) See

Figure 1 for the descriptions of 119909 and 119910For objects likely to be on the floor the height of the

camera above the ground is the 119911 position of the object Alsoif the platform roll can be measured or assumed negligiblethen the combination of the infinity point with the heightcan be used to obtain the range to any object on the floorof the hallway This same concept applies to objects whichare likely to be on the same wall or the ceiling By exploitingthe geometry of the corners present in the corridor our

6 Journal of Electrical and Computer Engineering

119908

1198641198671 = [119897 119889 minus119867]

1198641198621 = 119860119879 middot [119897 119889 minus119867]

119897119889120601

119867

120573

120595

(0 0 0)

1198641198672 = [119897 119889 minus 119908 minus119867]

1198641198622 = 119860119879 middot [119897 119889 minus 119908 minus119867]

Figure 4 A visual description the environment as perceived by theinfinity-point method

method computes the absolute range and bearing of thefeatures effectively turning them into landmarks needed forthe SLAM formulation See Figure 5which illustrates the finalappearance of the ranging algorithm

The graph in Figure 6 illustrates the disagreement bet-ween the line-perspectives and the infinity-point method(Section 23) in an experiment in which both algorithms exe-cuted simultaneously on the same video feedWith the partic-ular camera we used in the experiments (Logitech C905) theinfinity-point method yielded a 93 accuracy These num-bers are functions of camera resolution camera noise and theconsequent line extraction noise Therefore disagreementsnot exceeding 05 meters are in the favor of it with respectto accuracy Disagreements from the ground truth includeall transient measurement errors such as camera shake oroccasional introduction of moving objects that deceptivelymimic the environment and other anomaliesThe divergencebetween the two ranges that is visible between samples 20and 40 in Figure 6 is caused by a hallway line anomaly fromthe line extraction process independent of ranging In thisparticular case both the hallway lines have shifted causingthe infinity point to move left Horizontal translations of theinfinity point have a minimal effect on the measurementperformance of the infinity-point method being one of itsmain advantages Refer to Figure 7 for the demonstrationof the performance of these algorithms in a wide variety ofenvironments

The bias between the two measurements shown inFigure 6 is due to shifts in camera calibration parameters inbetween different experiments Certain environmental fac-tors have dramatic effects on lens precision such as accelera-tion corrosive atmosphere acoustic noise fluid contamina-tion low pressure vibration ballistic shock electromagneticradiation temperature and humidity Most of those condi-tions readily occur on an MAV (and most other platformsincluding human body) due to parts rotating at high speedspowerful air currents static electricity radio interferenceand so on Autocalibration concept is wide and beyond

the scope of this paper We present a novel mathematicalprocedure that addresses the issue of maintaining monocularcamera calibration automatically in hostile environments inanother paper of ours and we encourage the reader to refer toit [22]

3 Helix Bearing Algorithm

When the MAV approaches a turn an exit a T-section ora dead-end both ground lines tend to disappear simul-taneously Consequently range and heading measurementmethods cease to function A set of features might still bedetected and theMAV canmake a confident estimate of theirspatial pose However in the absence of depth informationa one-dimensional probability density over the depth isrepresented by a two-dimensional particle distribution

In this section we propose a turn-sensing algorithm toestimate120595 in the absence of orthogonality cuesThis situationautomatically triggers the turn-explorationmode in theMAVA yaw rotation of the body frame is initiated until anotherpassage is found The challenge is to estimate 120595 accuratelyenough to update the SLAM map correctly This proce-dure combines machine vision with the data matching anddynamic estimation problem For instance if the MAVapproaches a left-turn after exploring one leg of an ldquoLrdquo shapedhallway turns left 90 degrees and continues through the nextleg the map is expected to display two hallways joined at a90-degree angle Similarly a 180-degree turn before findinganother hallway would indicate a dead end This way theMAV can also determine where turns are located the nexttime they are visited

The newmeasurement problem at turns is to compute theinstantaneous velocity (119906 V) of every helix (moving feature)that the MAV is able to detect as shown in Figure 9 Inother words an attempt is made to recover 119881(119909 119910 119905) =

(119906(119909 119910 119905) (V(119909 119910 119905)) = (119889119909119889119905 119889119910119889119905) using a variation ofthe pyramidal Lucas-Kanade method This recovery leads toa 2D vector field obtained via perspective projection of the3D velocity field onto the image plane At discrete time stepsthe next frame is defined as a function of a previous frame as119868119905+1(119909 119910 119911 119905) = 119868

119905(119909 + 119889119909 119910 + 119889119910 119911 + 119889119911 119905 + 119889119905) By applying

the Taylor series expansion

119868 (119909 119910 119911 119905) +120597119868

120597119909120575119909 +

120597119868

120597119910120575119910 +

120597119868

120597119911120575119911 +

120597119868

120597119905120575119905 (10)

then by differentiating with respect to time yields the helixvelocity is obtained in terms of pixel distance per time step 119896

At this point each helix is assumed to be identicallydistributed and independently positioned on the image planeAnd each helix is associated with a velocity vector 119881

119894=

(V 120593)119879 where 120593 is the angular displacement of velocitydirection from the north of the image plane where 1205872 iseast 120587 is south and 31205872 is west Although the associateddepths of the helix set appearing at stochastic points on theimage plane are unknown assuming a constant there is arelationship between distance of a helix from the camera andits instantaneous velocity on the image plane This suggeststhat a helix cluster with respect to closeness of individual

Journal of Electrical and Computer Engineering 7

(1) Start from level 119871(0) = 0 and sequence119898 = 0(2) Find 119889 = min(ℎ

119886minus ℎ119887) in119872 where ℎ

119886= ℎ119887

(3) 119898 = 119898 + 1 Ψ101584010158401015840(119896) = merge([ℎ119886 ℎ119887]) 119871(119898) = 119889

(4) Delete from 119872 rows and columns corresponding to Ψ101584010158401015840(119896)(5) Add to 119872 a row and a column representing Ψ101584010158401015840(119896)(6) if (forallℎ

119894isin Ψ101584010158401015840(119896)) stop

(7) else go to (2)

Algorithm 1 Disjoint cluster identification from heat MAP119872

Figure 5 On-the-fly range measurements Note the crosshair indicating the algorithm is currently using the infinity point for heading

Sample number

Rang

e (m

)

0 20 40 60 80 100 120 140

858

757

656

Infinity point method

(a)

minus05

minus1

minus15

Sample number

Diff

eren

ce (m

)

0 20 40 60 80 100 120 140

050

(b)

Figure 6 (a) Illustrates the accuracy of the two-rangemeasurementmethodswith respect to ground truth (flat line) (b) Residuals for thetop figure

instantaneous velocities is likely to belong on the surface ofone planar object such as a door frame Let a helix with adirectional velocity be the triple ℎ

119894= (119881119894 119906119894 V119894)119879where (119906

119894 V119894)

represents the position of this particle on the image plane Atany given time (119896) let Ψ be a set containing all these featureson the image plane such that Ψ(119896) = ℎ

1 ℎ2 ℎ

119899 The 119911

component of velocity as obtained in (10) is the determining

factor for 120593 Since we are most interested in the set of helix inwhich this component is minimized Ψ(119896) is resampled suchthat

Ψ1015840(119896) = forallℎ

119894 120593 asymp

120587

2 cup 120593 asymp

3120587

2 (11)

sorted in increasing velocity order Ψ1015840(119896) is then processedthrough histogram sorting to reveal the modal helix set suchthat

Ψ10158401015840(119896) = max

if (ℎ119894= ℎ119894+1)

119899

sum

119894=0

119894

else 0

(12)

Ψ10158401015840(119896) is likely to contain clusters that tend to be distributed

with respect to objects in the scene whereas the rest of theinitial helix set fromΨ(119896)may not fit this model An agglom-erative hierarchical tree 119879 is used to identify the clustersTo construct the tree Ψ10158401015840(119896) is heat mapped represented asa symmetric matrix 119872 with respect to Manhattan distancebetween each individual helixes

119872 =[[

[

ℎ0minus ℎ0sdot sdot sdot ℎ0minus ℎ119899

d

ℎ119899minus ℎ0sdot sdot sdot ℎ119899minus ℎ119899

]]

]

(13)

The algorithm to construct the tree from 119872 is given inAlgorithm 1

The tree should be cut at the sequence119898 such that119898 + 1does not provide significant benefit in terms of modeling

8 Journal of Electrical and Computer Engineering

Figure 7 While we emphasize hallway like indoor environments our range measurement strategy is compatible with a variety of otherenvironments including outdoors office environments ceilings sidewalks and building sides where orthogonality in architecture is presentA minimum of one perspective line and one feature intersection is sufficient

the clusters After this step the set of velocities in Ψ101584010158401015840(119896)represent the largest planar object in the field of view withthe most consistent rate of pixel displacement in time Thesystem is updated such that Ψ(119896 + 1) = Ψ(119896) + 120583(Ψ101584010158401015840(119896)) asthe best effort estimate as shown in Figure 8

It is a future goal to improve the accuracy of this algo-rithm by exploiting known properties of typical objects Forinstance single doors are typically a meter-wide It is trivialto build an internal object database with templates for typicalconsistent objects found indoors If such an object of interestcould be identified by an arbitrary object detection algorithmand that world object of known dimensions dim = (119909 119910)

119879and a cluster Ψ101584010158401015840(119896) may sufficiently coincide cluster depthcan be measured via dim(119891dim1015840) where dim is the actualobject dimensions 119891 is the focal length and dim1015840 representsobject dimensions on image plane

4 SLAM Formulation

Our previous experiments [16 17] showed that due to thehighly nonlinear nature of the observation equations tra-ditional nonlinear observers such as EKF do not scale toSLAM in larger environments containing a vast number ofpotential landmarks Measurement updates in EKF requirequadratic time complexity due to the covariance matrixrendering the data association increasingly difficult as the

0 20 40 60 80 100 120 140 160 180 20080859095

100

Figure 8 This graph illustrates the accuracy of the Helix bearingalgorithm estimating 200 samples of perfect 95 degree turns (cali-brated with a digital protractor) performed at various locations withincreasing clutter at random angular rates not exceeding 1 radian-per-second in the absence of known objects

map grows AnMAVwith limited computational resources isparticularly impacted from this complexity behavior SLAMutilizing Rao-Blackwellized particle filter similar to [23]is a dynamic Bayesian approach to SLAM exploiting theconditional independence of measurements A random set ofparticles is generated using the noise model and dynamics ofthe vehicle in which each particle is considered a potentiallocation for the vehicle A reduced Kalman filter per particleis then associated with each of the current measurementsConsidering the limited computational resources of anMAVmaintaining a set of landmarks large enough to allow foraccurate motion estimations yet sparse enough so as not toproduce a negative impact on the system performance isimperativeThe noise model of the measurements along with

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

4 Journal of Electrical and Computer Engineering

where lowast denotes the convolution operator and 119865 is a 3 times 3kernel for horizontal and vertical derivative approximations1198681015840

119909and 1198681015840119910are combined with weights whose ratio determines

the range of angles through which edges will be filtered Thisin effect returns a binary image plane 1198681015840 with potential edgesthat are more horizontal than vertical It is possible to reversethis effect to detect other edges of interest such as ceilinglines or door frames At this point edges will disintegratethe more vertical they get (see Figure 3 for an illustration)Application of the Hough Transform to 1198681015840 will return allpossible lines automatically excluding discrete point sets outof which it is possible to sort out lines with a finite slope 120601 = 0

and curvature 120581 = 0 This is a significantly expensive oper-ation (ie considering the limited computational resourcesof an MAV) to perform on a real-time video feed since thetransform has to run over the entire frame including theredundant parts

To improve the overall performance in terms of efficiencywe have investigated replacing Hough Transform with analgorithm that only runs on parts of 1198681015840 that contain dataThis approach begins by dividing 1198681015840 into square blocks 119861

119909119910

Optimal block size is the smallest block that can still capturethe texture elements in 1198681015840 Camera resolution and filteringmethods used to obtain 1198681015840 affect the resulting texture elementstructure The blocks are sorted to bring the highest numberof data points with the lowest entropy (2) first as this is ablock most likely to contain lines Blocks that are empty orhave a few scattered points in them are excluded from furtheranalysis Entropy is the characteristic of an image patch thatmakes it more ambiguous by means of disorder in a closedsystem This assumes that disorder is more probable thanorder and thereby lower disorder has higher likelihood ofcontaining an architectural feature such as a line Entropy canbe expressed as

minussum

119909119910

119861119909119910

log119861119909119910 (2)

The set of candidate blocks resulting at this point are to besearched for lines Although a block 119861

119899is a binary matrix it

can be thought as a coordinate system which contains a set ofpoints (ie pixels) with (119909 119910) coordinates such that positive119909is right and positive 119910 is down Since we are more interestedin lines that are more horizontal than vertical it is safe toassume that the errors in the 119910 values outweigh those in the 119909values Equation for a ground line is in the form 119910 = 119898119909 + 119887and the deviations of data points in the block from this line are119889119894= 119910119894minus(119898119909

119894+119887)Therefore themost likely line is the one that

is composed of data points that minimize the deviation suchthat 1198892

119894= (119910119894minus 119898119909119894minus 119887)2 Using determinants the deviation

can be obtained as in (3)

119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (1199092

119894) sum119909

119894

sum119909119894

119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119898 times 119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (119909119894sdot 119910119894) sum119909

119894

sum119910119894

119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119887 times 119889119894=

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

sum (1199092

119894) sum (119909

119894sdot 119910119894)

sum119909119894

sum119910119894

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

(3)

Since our rangemeasurementmethods depend on these linesthe overall line-slope accuracy is affected by the reliabilityin detecting and measuring the hallway lines (or road linessidewalk lines depending on context) The high measure-ment noise in slopes has adverse effects on SLAM and shouldbe minimized to prevent inflating the uncertainty in 119871

1=

tan1206011and 119871

2= tan120601

2or the infinity point (119875

119909 119875119910) To

reduce this noise lines are cross-validated for the longestcollinearity via pixel neighborhood based line extraction inwhich the results obtained rely only on a local analysis Theircoherence is further improved using a postprocessing stepvia exploiting the texture gradient With an assumption ofthe orthogonality of the environment lines from the groundedges are extracted Note that this is also applicable to ceilinglines Although ground lines (and ceiling lines if applicable)are virtually parallel in the real world on the image plane theyintersectThe horizontal coordinate of this intersection pointis later used as a heading guide for the MAV as illustrated inFigure 5 Features that happen to coincide with these lines arepotential landmark candidates When this step is complete aset of features cross-validated with the perspective lines Ψ1015840which is a subset of Ψ with the nonuseful features removedis passed to the third step

23 Landmark Extraction Step III Range Measurement bythe Infinity-Point Method This step accurately measures theabsolute distance to features inΨ1015840 by integrating local patchesof the ground information into a global surface referenceframe This new method significantly differs from opticalflows in that the depth measurement does not require a suc-cessive history of images

Our strategy here assumes that the height of the camerafrom the ground 119867 is known a priori (see Figure 1) MAVprovides real-time altitude information to the camera Wealso assume that the camera is initially pointed at the generaldirection of the far end of the corridor This later assumptionis not a requirement if the camera is pointed at a wall thesystem will switch to visual steering mode and attempt torecover camera path withoutmapping until hallway structurebecomes available

The camera is tilted down (or up depending on pref-erence) with an angle 120573 to facilitate continuous capture offeaturemovement across perspective linesThe infinity point(119875119909 119875119910) is an imaginary concept where the projections of

the two parallel perspective lines appear to intersect onthe image plane Since this intersection point is in theoryinfinitely far from the camera it should present no parallax inresponse to the translations of the camera It does howevereffectively represent the yaw and the pitch of the camera(note the crosshair in Figure 5) Assume that the end pointsof the perspective lines are 119864

1198671= (119897 119889 minus119867)

119879 and 1198641198672

=

(119897 119889 minus 119908 minus119867)119879 where 119897 is length and 119908 is the width of the

hallway 119889 is the horizontal displacement of the camera fromthe left wall and 119867 is the MAV altitude (see Figure 4 fora visual description) The Euler rotation matrix to convert

Journal of Electrical and Computer Engineering 5

Figure 3 Initial stages after filtering for line extraction in which the line segments are being formed Note that the horizontal lines acrossthe image denote the artificial horizon for the MAV these are not architectural detections but the on-screen display provided by the MAVThis procedure is robust to transient disturbances such as people walking by or trees occluding the architecture

from the camera frame to the hallway frame is given in(4)

119860 =[[

[

119888120595119888120573 119888120573119904120595 minus119904120573

119888120595119904120601119904120573 minus 119888120601119904120595 119888120601119888120595 + 119904120601119904120595119904120573 119888120573119904120601

119904120601119904120595 + 119888120601119888120595119904120573 119888120601119904120595119904120573 minus 119888120595119904120601 119888120601119888120573

]]

]

(4)

where 119888 and 119904 are abbreviations for cos and sin functionsrespectively The vehicle yaw angle is denoted by 120595 thepitch by 120573 and the roll by 120601 Since the roll angle is con-trolled by the onboard autopilot system it can be set to bezero

The points 1198641198671

and 1198641198672

are transformed into the cameraframe via multiplication with the transpose of 119860 in (4)

1198641198621= 119860119879sdot (119897 119889 minus119867)

119879 119864

1198622= 119860119879sdot (119897 119889 minus 119908 minus119867)

119879

(5)

This 3D system is then transformed into the 2D image planevia

119906 =119910119891

119909 V =

119911119891

119909 (6)

where 119906 is the pixel horizontal position from center (right ispositive) V is the pixel vertical position from center (up ispositive) and 119891 is the focal length (37mm for the particularcamera we have used)The end points of the perspective lineshave now transformed from 119864

1198671and 119864

1198672to (119875119909

1 1198751199101)119879 and

(1198751199092 1198751199102)119879 respectively An infinitely long hallway can be

represented by

lim119897rarrinfin

1198751199091= lim119897rarrinfin

1198751199092= 119891 tan120595

lim119897rarrinfin

1198751199101= lim119897rarrinfin

1198751199102= minus

119891 tan120573cos120595

(7)

which is conceptually the same as extending the perspectivelines to infinity The fact that 119875119909

1= 119875119909

2and 119875119910

1= 119875119910

2

indicates that the intersection of the lines in the image planeis the end of such an infinitely long hallway Solving theresulting equations for 120595 and 120573 yields the camera yaw andpitch respectively

120595 = tanminus1 (119875119909

119891) 120573 = minustanminus1 (

119875119910cos120595119891

) (8)

A generic form of the transformation from the pixel position(119906 V) to (119909 119910 119911) can be derived in a similar fashion [3]The equations for 119906 and V also provide general coordinatesin the camera frame as (119911

119888119891V 119906119911

119888V 119911119888) where 119911

119888is the 119911

position of the object in the camera frame Multiplying with(4) transforms the hallway frame coordinates (119909 119910 119911) intofunctions of 119906 V and 119911

119888 Solving the new 119911 equation for 119911

119888

and substituting into the equations for 119909 and 119910 yields

119909 = ((11988612119906 + 11988613V + 11988611119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

119910 = ((11988622119906 + 11988623V + 11988621119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

(9)

where 119886119894119895denotes the elements of the matrix in (4) See

Figure 1 for the descriptions of 119909 and 119910For objects likely to be on the floor the height of the

camera above the ground is the 119911 position of the object Alsoif the platform roll can be measured or assumed negligiblethen the combination of the infinity point with the heightcan be used to obtain the range to any object on the floorof the hallway This same concept applies to objects whichare likely to be on the same wall or the ceiling By exploitingthe geometry of the corners present in the corridor our

6 Journal of Electrical and Computer Engineering

119908

1198641198671 = [119897 119889 minus119867]

1198641198621 = 119860119879 middot [119897 119889 minus119867]

119897119889120601

119867

120573

120595

(0 0 0)

1198641198672 = [119897 119889 minus 119908 minus119867]

1198641198622 = 119860119879 middot [119897 119889 minus 119908 minus119867]

Figure 4 A visual description the environment as perceived by theinfinity-point method

method computes the absolute range and bearing of thefeatures effectively turning them into landmarks needed forthe SLAM formulation See Figure 5which illustrates the finalappearance of the ranging algorithm

The graph in Figure 6 illustrates the disagreement bet-ween the line-perspectives and the infinity-point method(Section 23) in an experiment in which both algorithms exe-cuted simultaneously on the same video feedWith the partic-ular camera we used in the experiments (Logitech C905) theinfinity-point method yielded a 93 accuracy These num-bers are functions of camera resolution camera noise and theconsequent line extraction noise Therefore disagreementsnot exceeding 05 meters are in the favor of it with respectto accuracy Disagreements from the ground truth includeall transient measurement errors such as camera shake oroccasional introduction of moving objects that deceptivelymimic the environment and other anomaliesThe divergencebetween the two ranges that is visible between samples 20and 40 in Figure 6 is caused by a hallway line anomaly fromthe line extraction process independent of ranging In thisparticular case both the hallway lines have shifted causingthe infinity point to move left Horizontal translations of theinfinity point have a minimal effect on the measurementperformance of the infinity-point method being one of itsmain advantages Refer to Figure 7 for the demonstrationof the performance of these algorithms in a wide variety ofenvironments

The bias between the two measurements shown inFigure 6 is due to shifts in camera calibration parameters inbetween different experiments Certain environmental fac-tors have dramatic effects on lens precision such as accelera-tion corrosive atmosphere acoustic noise fluid contamina-tion low pressure vibration ballistic shock electromagneticradiation temperature and humidity Most of those condi-tions readily occur on an MAV (and most other platformsincluding human body) due to parts rotating at high speedspowerful air currents static electricity radio interferenceand so on Autocalibration concept is wide and beyond

the scope of this paper We present a novel mathematicalprocedure that addresses the issue of maintaining monocularcamera calibration automatically in hostile environments inanother paper of ours and we encourage the reader to refer toit [22]

3 Helix Bearing Algorithm

When the MAV approaches a turn an exit a T-section ora dead-end both ground lines tend to disappear simul-taneously Consequently range and heading measurementmethods cease to function A set of features might still bedetected and theMAV canmake a confident estimate of theirspatial pose However in the absence of depth informationa one-dimensional probability density over the depth isrepresented by a two-dimensional particle distribution

In this section we propose a turn-sensing algorithm toestimate120595 in the absence of orthogonality cuesThis situationautomatically triggers the turn-explorationmode in theMAVA yaw rotation of the body frame is initiated until anotherpassage is found The challenge is to estimate 120595 accuratelyenough to update the SLAM map correctly This proce-dure combines machine vision with the data matching anddynamic estimation problem For instance if the MAVapproaches a left-turn after exploring one leg of an ldquoLrdquo shapedhallway turns left 90 degrees and continues through the nextleg the map is expected to display two hallways joined at a90-degree angle Similarly a 180-degree turn before findinganother hallway would indicate a dead end This way theMAV can also determine where turns are located the nexttime they are visited

The newmeasurement problem at turns is to compute theinstantaneous velocity (119906 V) of every helix (moving feature)that the MAV is able to detect as shown in Figure 9 Inother words an attempt is made to recover 119881(119909 119910 119905) =

(119906(119909 119910 119905) (V(119909 119910 119905)) = (119889119909119889119905 119889119910119889119905) using a variation ofthe pyramidal Lucas-Kanade method This recovery leads toa 2D vector field obtained via perspective projection of the3D velocity field onto the image plane At discrete time stepsthe next frame is defined as a function of a previous frame as119868119905+1(119909 119910 119911 119905) = 119868

119905(119909 + 119889119909 119910 + 119889119910 119911 + 119889119911 119905 + 119889119905) By applying

the Taylor series expansion

119868 (119909 119910 119911 119905) +120597119868

120597119909120575119909 +

120597119868

120597119910120575119910 +

120597119868

120597119911120575119911 +

120597119868

120597119905120575119905 (10)

then by differentiating with respect to time yields the helixvelocity is obtained in terms of pixel distance per time step 119896

At this point each helix is assumed to be identicallydistributed and independently positioned on the image planeAnd each helix is associated with a velocity vector 119881

119894=

(V 120593)119879 where 120593 is the angular displacement of velocitydirection from the north of the image plane where 1205872 iseast 120587 is south and 31205872 is west Although the associateddepths of the helix set appearing at stochastic points on theimage plane are unknown assuming a constant there is arelationship between distance of a helix from the camera andits instantaneous velocity on the image plane This suggeststhat a helix cluster with respect to closeness of individual

Journal of Electrical and Computer Engineering 7

(1) Start from level 119871(0) = 0 and sequence119898 = 0(2) Find 119889 = min(ℎ

119886minus ℎ119887) in119872 where ℎ

119886= ℎ119887

(3) 119898 = 119898 + 1 Ψ101584010158401015840(119896) = merge([ℎ119886 ℎ119887]) 119871(119898) = 119889

(4) Delete from 119872 rows and columns corresponding to Ψ101584010158401015840(119896)(5) Add to 119872 a row and a column representing Ψ101584010158401015840(119896)(6) if (forallℎ

119894isin Ψ101584010158401015840(119896)) stop

(7) else go to (2)

Algorithm 1 Disjoint cluster identification from heat MAP119872

Figure 5 On-the-fly range measurements Note the crosshair indicating the algorithm is currently using the infinity point for heading

Sample number

Rang

e (m

)

0 20 40 60 80 100 120 140

858

757

656

Infinity point method

(a)

minus05

minus1

minus15

Sample number

Diff

eren

ce (m

)

0 20 40 60 80 100 120 140

050

(b)

Figure 6 (a) Illustrates the accuracy of the two-rangemeasurementmethodswith respect to ground truth (flat line) (b) Residuals for thetop figure

instantaneous velocities is likely to belong on the surface ofone planar object such as a door frame Let a helix with adirectional velocity be the triple ℎ

119894= (119881119894 119906119894 V119894)119879where (119906

119894 V119894)

represents the position of this particle on the image plane Atany given time (119896) let Ψ be a set containing all these featureson the image plane such that Ψ(119896) = ℎ

1 ℎ2 ℎ

119899 The 119911

component of velocity as obtained in (10) is the determining

factor for 120593 Since we are most interested in the set of helix inwhich this component is minimized Ψ(119896) is resampled suchthat

Ψ1015840(119896) = forallℎ

119894 120593 asymp

120587

2 cup 120593 asymp

3120587

2 (11)

sorted in increasing velocity order Ψ1015840(119896) is then processedthrough histogram sorting to reveal the modal helix set suchthat

Ψ10158401015840(119896) = max

if (ℎ119894= ℎ119894+1)

119899

sum

119894=0

119894

else 0

(12)

Ψ10158401015840(119896) is likely to contain clusters that tend to be distributed

with respect to objects in the scene whereas the rest of theinitial helix set fromΨ(119896)may not fit this model An agglom-erative hierarchical tree 119879 is used to identify the clustersTo construct the tree Ψ10158401015840(119896) is heat mapped represented asa symmetric matrix 119872 with respect to Manhattan distancebetween each individual helixes

119872 =[[

[

ℎ0minus ℎ0sdot sdot sdot ℎ0minus ℎ119899

d

ℎ119899minus ℎ0sdot sdot sdot ℎ119899minus ℎ119899

]]

]

(13)

The algorithm to construct the tree from 119872 is given inAlgorithm 1

The tree should be cut at the sequence119898 such that119898 + 1does not provide significant benefit in terms of modeling

8 Journal of Electrical and Computer Engineering

Figure 7 While we emphasize hallway like indoor environments our range measurement strategy is compatible with a variety of otherenvironments including outdoors office environments ceilings sidewalks and building sides where orthogonality in architecture is presentA minimum of one perspective line and one feature intersection is sufficient

the clusters After this step the set of velocities in Ψ101584010158401015840(119896)represent the largest planar object in the field of view withthe most consistent rate of pixel displacement in time Thesystem is updated such that Ψ(119896 + 1) = Ψ(119896) + 120583(Ψ101584010158401015840(119896)) asthe best effort estimate as shown in Figure 8

It is a future goal to improve the accuracy of this algo-rithm by exploiting known properties of typical objects Forinstance single doors are typically a meter-wide It is trivialto build an internal object database with templates for typicalconsistent objects found indoors If such an object of interestcould be identified by an arbitrary object detection algorithmand that world object of known dimensions dim = (119909 119910)

119879and a cluster Ψ101584010158401015840(119896) may sufficiently coincide cluster depthcan be measured via dim(119891dim1015840) where dim is the actualobject dimensions 119891 is the focal length and dim1015840 representsobject dimensions on image plane

4 SLAM Formulation

Our previous experiments [16 17] showed that due to thehighly nonlinear nature of the observation equations tra-ditional nonlinear observers such as EKF do not scale toSLAM in larger environments containing a vast number ofpotential landmarks Measurement updates in EKF requirequadratic time complexity due to the covariance matrixrendering the data association increasingly difficult as the

0 20 40 60 80 100 120 140 160 180 20080859095

100

Figure 8 This graph illustrates the accuracy of the Helix bearingalgorithm estimating 200 samples of perfect 95 degree turns (cali-brated with a digital protractor) performed at various locations withincreasing clutter at random angular rates not exceeding 1 radian-per-second in the absence of known objects

map grows AnMAVwith limited computational resources isparticularly impacted from this complexity behavior SLAMutilizing Rao-Blackwellized particle filter similar to [23]is a dynamic Bayesian approach to SLAM exploiting theconditional independence of measurements A random set ofparticles is generated using the noise model and dynamics ofthe vehicle in which each particle is considered a potentiallocation for the vehicle A reduced Kalman filter per particleis then associated with each of the current measurementsConsidering the limited computational resources of anMAVmaintaining a set of landmarks large enough to allow foraccurate motion estimations yet sparse enough so as not toproduce a negative impact on the system performance isimperativeThe noise model of the measurements along with

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

Journal of Electrical and Computer Engineering 5

Figure 3 Initial stages after filtering for line extraction in which the line segments are being formed Note that the horizontal lines acrossthe image denote the artificial horizon for the MAV these are not architectural detections but the on-screen display provided by the MAVThis procedure is robust to transient disturbances such as people walking by or trees occluding the architecture

from the camera frame to the hallway frame is given in(4)

119860 =[[

[

119888120595119888120573 119888120573119904120595 minus119904120573

119888120595119904120601119904120573 minus 119888120601119904120595 119888120601119888120595 + 119904120601119904120595119904120573 119888120573119904120601

119904120601119904120595 + 119888120601119888120595119904120573 119888120601119904120595119904120573 minus 119888120595119904120601 119888120601119888120573

]]

]

(4)

where 119888 and 119904 are abbreviations for cos and sin functionsrespectively The vehicle yaw angle is denoted by 120595 thepitch by 120573 and the roll by 120601 Since the roll angle is con-trolled by the onboard autopilot system it can be set to bezero

The points 1198641198671

and 1198641198672

are transformed into the cameraframe via multiplication with the transpose of 119860 in (4)

1198641198621= 119860119879sdot (119897 119889 minus119867)

119879 119864

1198622= 119860119879sdot (119897 119889 minus 119908 minus119867)

119879

(5)

This 3D system is then transformed into the 2D image planevia

119906 =119910119891

119909 V =

119911119891

119909 (6)

where 119906 is the pixel horizontal position from center (right ispositive) V is the pixel vertical position from center (up ispositive) and 119891 is the focal length (37mm for the particularcamera we have used)The end points of the perspective lineshave now transformed from 119864

1198671and 119864

1198672to (119875119909

1 1198751199101)119879 and

(1198751199092 1198751199102)119879 respectively An infinitely long hallway can be

represented by

lim119897rarrinfin

1198751199091= lim119897rarrinfin

1198751199092= 119891 tan120595

lim119897rarrinfin

1198751199101= lim119897rarrinfin

1198751199102= minus

119891 tan120573cos120595

(7)

which is conceptually the same as extending the perspectivelines to infinity The fact that 119875119909

1= 119875119909

2and 119875119910

1= 119875119910

2

indicates that the intersection of the lines in the image planeis the end of such an infinitely long hallway Solving theresulting equations for 120595 and 120573 yields the camera yaw andpitch respectively

120595 = tanminus1 (119875119909

119891) 120573 = minustanminus1 (

119875119910cos120595119891

) (8)

A generic form of the transformation from the pixel position(119906 V) to (119909 119910 119911) can be derived in a similar fashion [3]The equations for 119906 and V also provide general coordinatesin the camera frame as (119911

119888119891V 119906119911

119888V 119911119888) where 119911

119888is the 119911

position of the object in the camera frame Multiplying with(4) transforms the hallway frame coordinates (119909 119910 119911) intofunctions of 119906 V and 119911

119888 Solving the new 119911 equation for 119911

119888

and substituting into the equations for 119909 and 119910 yields

119909 = ((11988612119906 + 11988613V + 11988611119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

119910 = ((11988622119906 + 11988623V + 11988621119891)

(11988632119906 + 11988633V + 11988631119891)) 119911

(9)

where 119886119894119895denotes the elements of the matrix in (4) See

Figure 1 for the descriptions of 119909 and 119910For objects likely to be on the floor the height of the

camera above the ground is the 119911 position of the object Alsoif the platform roll can be measured or assumed negligiblethen the combination of the infinity point with the heightcan be used to obtain the range to any object on the floorof the hallway This same concept applies to objects whichare likely to be on the same wall or the ceiling By exploitingthe geometry of the corners present in the corridor our

6 Journal of Electrical and Computer Engineering

119908

1198641198671 = [119897 119889 minus119867]

1198641198621 = 119860119879 middot [119897 119889 minus119867]

119897119889120601

119867

120573

120595

(0 0 0)

1198641198672 = [119897 119889 minus 119908 minus119867]

1198641198622 = 119860119879 middot [119897 119889 minus 119908 minus119867]

Figure 4 A visual description the environment as perceived by theinfinity-point method

method computes the absolute range and bearing of thefeatures effectively turning them into landmarks needed forthe SLAM formulation See Figure 5which illustrates the finalappearance of the ranging algorithm

The graph in Figure 6 illustrates the disagreement bet-ween the line-perspectives and the infinity-point method(Section 23) in an experiment in which both algorithms exe-cuted simultaneously on the same video feedWith the partic-ular camera we used in the experiments (Logitech C905) theinfinity-point method yielded a 93 accuracy These num-bers are functions of camera resolution camera noise and theconsequent line extraction noise Therefore disagreementsnot exceeding 05 meters are in the favor of it with respectto accuracy Disagreements from the ground truth includeall transient measurement errors such as camera shake oroccasional introduction of moving objects that deceptivelymimic the environment and other anomaliesThe divergencebetween the two ranges that is visible between samples 20and 40 in Figure 6 is caused by a hallway line anomaly fromthe line extraction process independent of ranging In thisparticular case both the hallway lines have shifted causingthe infinity point to move left Horizontal translations of theinfinity point have a minimal effect on the measurementperformance of the infinity-point method being one of itsmain advantages Refer to Figure 7 for the demonstrationof the performance of these algorithms in a wide variety ofenvironments

The bias between the two measurements shown inFigure 6 is due to shifts in camera calibration parameters inbetween different experiments Certain environmental fac-tors have dramatic effects on lens precision such as accelera-tion corrosive atmosphere acoustic noise fluid contamina-tion low pressure vibration ballistic shock electromagneticradiation temperature and humidity Most of those condi-tions readily occur on an MAV (and most other platformsincluding human body) due to parts rotating at high speedspowerful air currents static electricity radio interferenceand so on Autocalibration concept is wide and beyond

the scope of this paper We present a novel mathematicalprocedure that addresses the issue of maintaining monocularcamera calibration automatically in hostile environments inanother paper of ours and we encourage the reader to refer toit [22]

3 Helix Bearing Algorithm

When the MAV approaches a turn an exit a T-section ora dead-end both ground lines tend to disappear simul-taneously Consequently range and heading measurementmethods cease to function A set of features might still bedetected and theMAV canmake a confident estimate of theirspatial pose However in the absence of depth informationa one-dimensional probability density over the depth isrepresented by a two-dimensional particle distribution

In this section we propose a turn-sensing algorithm toestimate120595 in the absence of orthogonality cuesThis situationautomatically triggers the turn-explorationmode in theMAVA yaw rotation of the body frame is initiated until anotherpassage is found The challenge is to estimate 120595 accuratelyenough to update the SLAM map correctly This proce-dure combines machine vision with the data matching anddynamic estimation problem For instance if the MAVapproaches a left-turn after exploring one leg of an ldquoLrdquo shapedhallway turns left 90 degrees and continues through the nextleg the map is expected to display two hallways joined at a90-degree angle Similarly a 180-degree turn before findinganother hallway would indicate a dead end This way theMAV can also determine where turns are located the nexttime they are visited

The newmeasurement problem at turns is to compute theinstantaneous velocity (119906 V) of every helix (moving feature)that the MAV is able to detect as shown in Figure 9 Inother words an attempt is made to recover 119881(119909 119910 119905) =

(119906(119909 119910 119905) (V(119909 119910 119905)) = (119889119909119889119905 119889119910119889119905) using a variation ofthe pyramidal Lucas-Kanade method This recovery leads toa 2D vector field obtained via perspective projection of the3D velocity field onto the image plane At discrete time stepsthe next frame is defined as a function of a previous frame as119868119905+1(119909 119910 119911 119905) = 119868

119905(119909 + 119889119909 119910 + 119889119910 119911 + 119889119911 119905 + 119889119905) By applying

the Taylor series expansion

119868 (119909 119910 119911 119905) +120597119868

120597119909120575119909 +

120597119868

120597119910120575119910 +

120597119868

120597119911120575119911 +

120597119868

120597119905120575119905 (10)

then by differentiating with respect to time yields the helixvelocity is obtained in terms of pixel distance per time step 119896

At this point each helix is assumed to be identicallydistributed and independently positioned on the image planeAnd each helix is associated with a velocity vector 119881

119894=

(V 120593)119879 where 120593 is the angular displacement of velocitydirection from the north of the image plane where 1205872 iseast 120587 is south and 31205872 is west Although the associateddepths of the helix set appearing at stochastic points on theimage plane are unknown assuming a constant there is arelationship between distance of a helix from the camera andits instantaneous velocity on the image plane This suggeststhat a helix cluster with respect to closeness of individual

Journal of Electrical and Computer Engineering 7

(1) Start from level 119871(0) = 0 and sequence119898 = 0(2) Find 119889 = min(ℎ

119886minus ℎ119887) in119872 where ℎ

119886= ℎ119887

(3) 119898 = 119898 + 1 Ψ101584010158401015840(119896) = merge([ℎ119886 ℎ119887]) 119871(119898) = 119889

(4) Delete from 119872 rows and columns corresponding to Ψ101584010158401015840(119896)(5) Add to 119872 a row and a column representing Ψ101584010158401015840(119896)(6) if (forallℎ

119894isin Ψ101584010158401015840(119896)) stop

(7) else go to (2)

Algorithm 1 Disjoint cluster identification from heat MAP119872

Figure 5 On-the-fly range measurements Note the crosshair indicating the algorithm is currently using the infinity point for heading

Sample number

Rang

e (m

)

0 20 40 60 80 100 120 140

858

757

656

Infinity point method

(a)

minus05

minus1

minus15

Sample number

Diff

eren

ce (m

)

0 20 40 60 80 100 120 140

050

(b)

Figure 6 (a) Illustrates the accuracy of the two-rangemeasurementmethodswith respect to ground truth (flat line) (b) Residuals for thetop figure

instantaneous velocities is likely to belong on the surface ofone planar object such as a door frame Let a helix with adirectional velocity be the triple ℎ

119894= (119881119894 119906119894 V119894)119879where (119906

119894 V119894)

represents the position of this particle on the image plane Atany given time (119896) let Ψ be a set containing all these featureson the image plane such that Ψ(119896) = ℎ

1 ℎ2 ℎ

119899 The 119911

component of velocity as obtained in (10) is the determining

factor for 120593 Since we are most interested in the set of helix inwhich this component is minimized Ψ(119896) is resampled suchthat

Ψ1015840(119896) = forallℎ

119894 120593 asymp

120587

2 cup 120593 asymp

3120587

2 (11)

sorted in increasing velocity order Ψ1015840(119896) is then processedthrough histogram sorting to reveal the modal helix set suchthat

Ψ10158401015840(119896) = max

if (ℎ119894= ℎ119894+1)

119899

sum

119894=0

119894

else 0

(12)

Ψ10158401015840(119896) is likely to contain clusters that tend to be distributed

with respect to objects in the scene whereas the rest of theinitial helix set fromΨ(119896)may not fit this model An agglom-erative hierarchical tree 119879 is used to identify the clustersTo construct the tree Ψ10158401015840(119896) is heat mapped represented asa symmetric matrix 119872 with respect to Manhattan distancebetween each individual helixes

119872 =[[

[

ℎ0minus ℎ0sdot sdot sdot ℎ0minus ℎ119899

d

ℎ119899minus ℎ0sdot sdot sdot ℎ119899minus ℎ119899

]]

]

(13)

The algorithm to construct the tree from 119872 is given inAlgorithm 1

The tree should be cut at the sequence119898 such that119898 + 1does not provide significant benefit in terms of modeling

8 Journal of Electrical and Computer Engineering

Figure 7 While we emphasize hallway like indoor environments our range measurement strategy is compatible with a variety of otherenvironments including outdoors office environments ceilings sidewalks and building sides where orthogonality in architecture is presentA minimum of one perspective line and one feature intersection is sufficient

the clusters After this step the set of velocities in Ψ101584010158401015840(119896)represent the largest planar object in the field of view withthe most consistent rate of pixel displacement in time Thesystem is updated such that Ψ(119896 + 1) = Ψ(119896) + 120583(Ψ101584010158401015840(119896)) asthe best effort estimate as shown in Figure 8

It is a future goal to improve the accuracy of this algo-rithm by exploiting known properties of typical objects Forinstance single doors are typically a meter-wide It is trivialto build an internal object database with templates for typicalconsistent objects found indoors If such an object of interestcould be identified by an arbitrary object detection algorithmand that world object of known dimensions dim = (119909 119910)

119879and a cluster Ψ101584010158401015840(119896) may sufficiently coincide cluster depthcan be measured via dim(119891dim1015840) where dim is the actualobject dimensions 119891 is the focal length and dim1015840 representsobject dimensions on image plane

4 SLAM Formulation

Our previous experiments [16 17] showed that due to thehighly nonlinear nature of the observation equations tra-ditional nonlinear observers such as EKF do not scale toSLAM in larger environments containing a vast number ofpotential landmarks Measurement updates in EKF requirequadratic time complexity due to the covariance matrixrendering the data association increasingly difficult as the

0 20 40 60 80 100 120 140 160 180 20080859095

100

Figure 8 This graph illustrates the accuracy of the Helix bearingalgorithm estimating 200 samples of perfect 95 degree turns (cali-brated with a digital protractor) performed at various locations withincreasing clutter at random angular rates not exceeding 1 radian-per-second in the absence of known objects

map grows AnMAVwith limited computational resources isparticularly impacted from this complexity behavior SLAMutilizing Rao-Blackwellized particle filter similar to [23]is a dynamic Bayesian approach to SLAM exploiting theconditional independence of measurements A random set ofparticles is generated using the noise model and dynamics ofthe vehicle in which each particle is considered a potentiallocation for the vehicle A reduced Kalman filter per particleis then associated with each of the current measurementsConsidering the limited computational resources of anMAVmaintaining a set of landmarks large enough to allow foraccurate motion estimations yet sparse enough so as not toproduce a negative impact on the system performance isimperativeThe noise model of the measurements along with

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

6 Journal of Electrical and Computer Engineering

119908

1198641198671 = [119897 119889 minus119867]

1198641198621 = 119860119879 middot [119897 119889 minus119867]

119897119889120601

119867

120573

120595

(0 0 0)

1198641198672 = [119897 119889 minus 119908 minus119867]

1198641198622 = 119860119879 middot [119897 119889 minus 119908 minus119867]

Figure 4 A visual description the environment as perceived by theinfinity-point method

method computes the absolute range and bearing of thefeatures effectively turning them into landmarks needed forthe SLAM formulation See Figure 5which illustrates the finalappearance of the ranging algorithm

The graph in Figure 6 illustrates the disagreement bet-ween the line-perspectives and the infinity-point method(Section 23) in an experiment in which both algorithms exe-cuted simultaneously on the same video feedWith the partic-ular camera we used in the experiments (Logitech C905) theinfinity-point method yielded a 93 accuracy These num-bers are functions of camera resolution camera noise and theconsequent line extraction noise Therefore disagreementsnot exceeding 05 meters are in the favor of it with respectto accuracy Disagreements from the ground truth includeall transient measurement errors such as camera shake oroccasional introduction of moving objects that deceptivelymimic the environment and other anomaliesThe divergencebetween the two ranges that is visible between samples 20and 40 in Figure 6 is caused by a hallway line anomaly fromthe line extraction process independent of ranging In thisparticular case both the hallway lines have shifted causingthe infinity point to move left Horizontal translations of theinfinity point have a minimal effect on the measurementperformance of the infinity-point method being one of itsmain advantages Refer to Figure 7 for the demonstrationof the performance of these algorithms in a wide variety ofenvironments

The bias between the two measurements shown inFigure 6 is due to shifts in camera calibration parameters inbetween different experiments Certain environmental fac-tors have dramatic effects on lens precision such as accelera-tion corrosive atmosphere acoustic noise fluid contamina-tion low pressure vibration ballistic shock electromagneticradiation temperature and humidity Most of those condi-tions readily occur on an MAV (and most other platformsincluding human body) due to parts rotating at high speedspowerful air currents static electricity radio interferenceand so on Autocalibration concept is wide and beyond

the scope of this paper We present a novel mathematicalprocedure that addresses the issue of maintaining monocularcamera calibration automatically in hostile environments inanother paper of ours and we encourage the reader to refer toit [22]

3 Helix Bearing Algorithm

When the MAV approaches a turn an exit a T-section ora dead-end both ground lines tend to disappear simul-taneously Consequently range and heading measurementmethods cease to function A set of features might still bedetected and theMAV canmake a confident estimate of theirspatial pose However in the absence of depth informationa one-dimensional probability density over the depth isrepresented by a two-dimensional particle distribution

In this section we propose a turn-sensing algorithm toestimate120595 in the absence of orthogonality cuesThis situationautomatically triggers the turn-explorationmode in theMAVA yaw rotation of the body frame is initiated until anotherpassage is found The challenge is to estimate 120595 accuratelyenough to update the SLAM map correctly This proce-dure combines machine vision with the data matching anddynamic estimation problem For instance if the MAVapproaches a left-turn after exploring one leg of an ldquoLrdquo shapedhallway turns left 90 degrees and continues through the nextleg the map is expected to display two hallways joined at a90-degree angle Similarly a 180-degree turn before findinganother hallway would indicate a dead end This way theMAV can also determine where turns are located the nexttime they are visited

The newmeasurement problem at turns is to compute theinstantaneous velocity (119906 V) of every helix (moving feature)that the MAV is able to detect as shown in Figure 9 Inother words an attempt is made to recover 119881(119909 119910 119905) =

(119906(119909 119910 119905) (V(119909 119910 119905)) = (119889119909119889119905 119889119910119889119905) using a variation ofthe pyramidal Lucas-Kanade method This recovery leads toa 2D vector field obtained via perspective projection of the3D velocity field onto the image plane At discrete time stepsthe next frame is defined as a function of a previous frame as119868119905+1(119909 119910 119911 119905) = 119868

119905(119909 + 119889119909 119910 + 119889119910 119911 + 119889119911 119905 + 119889119905) By applying

the Taylor series expansion

119868 (119909 119910 119911 119905) +120597119868

120597119909120575119909 +

120597119868

120597119910120575119910 +

120597119868

120597119911120575119911 +

120597119868

120597119905120575119905 (10)

then by differentiating with respect to time yields the helixvelocity is obtained in terms of pixel distance per time step 119896

At this point each helix is assumed to be identicallydistributed and independently positioned on the image planeAnd each helix is associated with a velocity vector 119881

119894=

(V 120593)119879 where 120593 is the angular displacement of velocitydirection from the north of the image plane where 1205872 iseast 120587 is south and 31205872 is west Although the associateddepths of the helix set appearing at stochastic points on theimage plane are unknown assuming a constant there is arelationship between distance of a helix from the camera andits instantaneous velocity on the image plane This suggeststhat a helix cluster with respect to closeness of individual

Journal of Electrical and Computer Engineering 7

(1) Start from level 119871(0) = 0 and sequence119898 = 0(2) Find 119889 = min(ℎ

119886minus ℎ119887) in119872 where ℎ

119886= ℎ119887

(3) 119898 = 119898 + 1 Ψ101584010158401015840(119896) = merge([ℎ119886 ℎ119887]) 119871(119898) = 119889

(4) Delete from 119872 rows and columns corresponding to Ψ101584010158401015840(119896)(5) Add to 119872 a row and a column representing Ψ101584010158401015840(119896)(6) if (forallℎ

119894isin Ψ101584010158401015840(119896)) stop

(7) else go to (2)

Algorithm 1 Disjoint cluster identification from heat MAP119872

Figure 5 On-the-fly range measurements Note the crosshair indicating the algorithm is currently using the infinity point for heading

Sample number

Rang

e (m

)

0 20 40 60 80 100 120 140

858

757

656

Infinity point method

(a)

minus05

minus1

minus15

Sample number

Diff

eren

ce (m

)

0 20 40 60 80 100 120 140

050

(b)

Figure 6 (a) Illustrates the accuracy of the two-rangemeasurementmethodswith respect to ground truth (flat line) (b) Residuals for thetop figure

instantaneous velocities is likely to belong on the surface ofone planar object such as a door frame Let a helix with adirectional velocity be the triple ℎ

119894= (119881119894 119906119894 V119894)119879where (119906

119894 V119894)

represents the position of this particle on the image plane Atany given time (119896) let Ψ be a set containing all these featureson the image plane such that Ψ(119896) = ℎ

1 ℎ2 ℎ

119899 The 119911

component of velocity as obtained in (10) is the determining

factor for 120593 Since we are most interested in the set of helix inwhich this component is minimized Ψ(119896) is resampled suchthat

Ψ1015840(119896) = forallℎ

119894 120593 asymp

120587

2 cup 120593 asymp

3120587

2 (11)

sorted in increasing velocity order Ψ1015840(119896) is then processedthrough histogram sorting to reveal the modal helix set suchthat

Ψ10158401015840(119896) = max

if (ℎ119894= ℎ119894+1)

119899

sum

119894=0

119894

else 0

(12)

Ψ10158401015840(119896) is likely to contain clusters that tend to be distributed

with respect to objects in the scene whereas the rest of theinitial helix set fromΨ(119896)may not fit this model An agglom-erative hierarchical tree 119879 is used to identify the clustersTo construct the tree Ψ10158401015840(119896) is heat mapped represented asa symmetric matrix 119872 with respect to Manhattan distancebetween each individual helixes

119872 =[[

[

ℎ0minus ℎ0sdot sdot sdot ℎ0minus ℎ119899

d

ℎ119899minus ℎ0sdot sdot sdot ℎ119899minus ℎ119899

]]

]

(13)

The algorithm to construct the tree from 119872 is given inAlgorithm 1

The tree should be cut at the sequence119898 such that119898 + 1does not provide significant benefit in terms of modeling

8 Journal of Electrical and Computer Engineering

Figure 7 While we emphasize hallway like indoor environments our range measurement strategy is compatible with a variety of otherenvironments including outdoors office environments ceilings sidewalks and building sides where orthogonality in architecture is presentA minimum of one perspective line and one feature intersection is sufficient

the clusters After this step the set of velocities in Ψ101584010158401015840(119896)represent the largest planar object in the field of view withthe most consistent rate of pixel displacement in time Thesystem is updated such that Ψ(119896 + 1) = Ψ(119896) + 120583(Ψ101584010158401015840(119896)) asthe best effort estimate as shown in Figure 8

It is a future goal to improve the accuracy of this algo-rithm by exploiting known properties of typical objects Forinstance single doors are typically a meter-wide It is trivialto build an internal object database with templates for typicalconsistent objects found indoors If such an object of interestcould be identified by an arbitrary object detection algorithmand that world object of known dimensions dim = (119909 119910)

119879and a cluster Ψ101584010158401015840(119896) may sufficiently coincide cluster depthcan be measured via dim(119891dim1015840) where dim is the actualobject dimensions 119891 is the focal length and dim1015840 representsobject dimensions on image plane

4 SLAM Formulation

Our previous experiments [16 17] showed that due to thehighly nonlinear nature of the observation equations tra-ditional nonlinear observers such as EKF do not scale toSLAM in larger environments containing a vast number ofpotential landmarks Measurement updates in EKF requirequadratic time complexity due to the covariance matrixrendering the data association increasingly difficult as the

0 20 40 60 80 100 120 140 160 180 20080859095

100

Figure 8 This graph illustrates the accuracy of the Helix bearingalgorithm estimating 200 samples of perfect 95 degree turns (cali-brated with a digital protractor) performed at various locations withincreasing clutter at random angular rates not exceeding 1 radian-per-second in the absence of known objects

map grows AnMAVwith limited computational resources isparticularly impacted from this complexity behavior SLAMutilizing Rao-Blackwellized particle filter similar to [23]is a dynamic Bayesian approach to SLAM exploiting theconditional independence of measurements A random set ofparticles is generated using the noise model and dynamics ofthe vehicle in which each particle is considered a potentiallocation for the vehicle A reduced Kalman filter per particleis then associated with each of the current measurementsConsidering the limited computational resources of anMAVmaintaining a set of landmarks large enough to allow foraccurate motion estimations yet sparse enough so as not toproduce a negative impact on the system performance isimperativeThe noise model of the measurements along with

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

Journal of Electrical and Computer Engineering 7

(1) Start from level 119871(0) = 0 and sequence119898 = 0(2) Find 119889 = min(ℎ

119886minus ℎ119887) in119872 where ℎ

119886= ℎ119887

(3) 119898 = 119898 + 1 Ψ101584010158401015840(119896) = merge([ℎ119886 ℎ119887]) 119871(119898) = 119889

(4) Delete from 119872 rows and columns corresponding to Ψ101584010158401015840(119896)(5) Add to 119872 a row and a column representing Ψ101584010158401015840(119896)(6) if (forallℎ

119894isin Ψ101584010158401015840(119896)) stop

(7) else go to (2)

Algorithm 1 Disjoint cluster identification from heat MAP119872

Figure 5 On-the-fly range measurements Note the crosshair indicating the algorithm is currently using the infinity point for heading

Sample number

Rang

e (m

)

0 20 40 60 80 100 120 140

858

757

656

Infinity point method

(a)

minus05

minus1

minus15

Sample number

Diff

eren

ce (m

)

0 20 40 60 80 100 120 140

050

(b)

Figure 6 (a) Illustrates the accuracy of the two-rangemeasurementmethodswith respect to ground truth (flat line) (b) Residuals for thetop figure

instantaneous velocities is likely to belong on the surface ofone planar object such as a door frame Let a helix with adirectional velocity be the triple ℎ

119894= (119881119894 119906119894 V119894)119879where (119906

119894 V119894)

represents the position of this particle on the image plane Atany given time (119896) let Ψ be a set containing all these featureson the image plane such that Ψ(119896) = ℎ

1 ℎ2 ℎ

119899 The 119911

component of velocity as obtained in (10) is the determining

factor for 120593 Since we are most interested in the set of helix inwhich this component is minimized Ψ(119896) is resampled suchthat

Ψ1015840(119896) = forallℎ

119894 120593 asymp

120587

2 cup 120593 asymp

3120587

2 (11)

sorted in increasing velocity order Ψ1015840(119896) is then processedthrough histogram sorting to reveal the modal helix set suchthat

Ψ10158401015840(119896) = max

if (ℎ119894= ℎ119894+1)

119899

sum

119894=0

119894

else 0

(12)

Ψ10158401015840(119896) is likely to contain clusters that tend to be distributed

with respect to objects in the scene whereas the rest of theinitial helix set fromΨ(119896)may not fit this model An agglom-erative hierarchical tree 119879 is used to identify the clustersTo construct the tree Ψ10158401015840(119896) is heat mapped represented asa symmetric matrix 119872 with respect to Manhattan distancebetween each individual helixes

119872 =[[

[

ℎ0minus ℎ0sdot sdot sdot ℎ0minus ℎ119899

d

ℎ119899minus ℎ0sdot sdot sdot ℎ119899minus ℎ119899

]]

]

(13)

The algorithm to construct the tree from 119872 is given inAlgorithm 1

The tree should be cut at the sequence119898 such that119898 + 1does not provide significant benefit in terms of modeling

8 Journal of Electrical and Computer Engineering

Figure 7 While we emphasize hallway like indoor environments our range measurement strategy is compatible with a variety of otherenvironments including outdoors office environments ceilings sidewalks and building sides where orthogonality in architecture is presentA minimum of one perspective line and one feature intersection is sufficient

the clusters After this step the set of velocities in Ψ101584010158401015840(119896)represent the largest planar object in the field of view withthe most consistent rate of pixel displacement in time Thesystem is updated such that Ψ(119896 + 1) = Ψ(119896) + 120583(Ψ101584010158401015840(119896)) asthe best effort estimate as shown in Figure 8

It is a future goal to improve the accuracy of this algo-rithm by exploiting known properties of typical objects Forinstance single doors are typically a meter-wide It is trivialto build an internal object database with templates for typicalconsistent objects found indoors If such an object of interestcould be identified by an arbitrary object detection algorithmand that world object of known dimensions dim = (119909 119910)

119879and a cluster Ψ101584010158401015840(119896) may sufficiently coincide cluster depthcan be measured via dim(119891dim1015840) where dim is the actualobject dimensions 119891 is the focal length and dim1015840 representsobject dimensions on image plane

4 SLAM Formulation

Our previous experiments [16 17] showed that due to thehighly nonlinear nature of the observation equations tra-ditional nonlinear observers such as EKF do not scale toSLAM in larger environments containing a vast number ofpotential landmarks Measurement updates in EKF requirequadratic time complexity due to the covariance matrixrendering the data association increasingly difficult as the

0 20 40 60 80 100 120 140 160 180 20080859095

100

Figure 8 This graph illustrates the accuracy of the Helix bearingalgorithm estimating 200 samples of perfect 95 degree turns (cali-brated with a digital protractor) performed at various locations withincreasing clutter at random angular rates not exceeding 1 radian-per-second in the absence of known objects

map grows AnMAVwith limited computational resources isparticularly impacted from this complexity behavior SLAMutilizing Rao-Blackwellized particle filter similar to [23]is a dynamic Bayesian approach to SLAM exploiting theconditional independence of measurements A random set ofparticles is generated using the noise model and dynamics ofthe vehicle in which each particle is considered a potentiallocation for the vehicle A reduced Kalman filter per particleis then associated with each of the current measurementsConsidering the limited computational resources of anMAVmaintaining a set of landmarks large enough to allow foraccurate motion estimations yet sparse enough so as not toproduce a negative impact on the system performance isimperativeThe noise model of the measurements along with

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

8 Journal of Electrical and Computer Engineering

Figure 7 While we emphasize hallway like indoor environments our range measurement strategy is compatible with a variety of otherenvironments including outdoors office environments ceilings sidewalks and building sides where orthogonality in architecture is presentA minimum of one perspective line and one feature intersection is sufficient

the clusters After this step the set of velocities in Ψ101584010158401015840(119896)represent the largest planar object in the field of view withthe most consistent rate of pixel displacement in time Thesystem is updated such that Ψ(119896 + 1) = Ψ(119896) + 120583(Ψ101584010158401015840(119896)) asthe best effort estimate as shown in Figure 8

It is a future goal to improve the accuracy of this algo-rithm by exploiting known properties of typical objects Forinstance single doors are typically a meter-wide It is trivialto build an internal object database with templates for typicalconsistent objects found indoors If such an object of interestcould be identified by an arbitrary object detection algorithmand that world object of known dimensions dim = (119909 119910)

119879and a cluster Ψ101584010158401015840(119896) may sufficiently coincide cluster depthcan be measured via dim(119891dim1015840) where dim is the actualobject dimensions 119891 is the focal length and dim1015840 representsobject dimensions on image plane

4 SLAM Formulation

Our previous experiments [16 17] showed that due to thehighly nonlinear nature of the observation equations tra-ditional nonlinear observers such as EKF do not scale toSLAM in larger environments containing a vast number ofpotential landmarks Measurement updates in EKF requirequadratic time complexity due to the covariance matrixrendering the data association increasingly difficult as the

0 20 40 60 80 100 120 140 160 180 20080859095

100

Figure 8 This graph illustrates the accuracy of the Helix bearingalgorithm estimating 200 samples of perfect 95 degree turns (cali-brated with a digital protractor) performed at various locations withincreasing clutter at random angular rates not exceeding 1 radian-per-second in the absence of known objects

map grows AnMAVwith limited computational resources isparticularly impacted from this complexity behavior SLAMutilizing Rao-Blackwellized particle filter similar to [23]is a dynamic Bayesian approach to SLAM exploiting theconditional independence of measurements A random set ofparticles is generated using the noise model and dynamics ofthe vehicle in which each particle is considered a potentiallocation for the vehicle A reduced Kalman filter per particleis then associated with each of the current measurementsConsidering the limited computational resources of anMAVmaintaining a set of landmarks large enough to allow foraccurate motion estimations yet sparse enough so as not toproduce a negative impact on the system performance isimperativeThe noise model of the measurements along with

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

Journal of Electrical and Computer Engineering 9

120596119899119881119899

120596 = (119889119889119905)120579Hallway-1 line-L

Hallway-1 line-R Hallway-2 line-R

Figure 9 The helix bearing algorithm exploits the optical flow fieldresulting from the features not associated with architectural lines Areduced helix association set is shown for clarityHelix velocities thatform statistically identifiable clusters indicate the presence of largeobjects such as doors that can provide estimation for the angularrate of the MAV during the turn

the new measurement and old position of the feature areused to generate a statistical weight This weight in essenceis ameasure of howwell the landmarks in the previous sensorposition correlate with the measured position taking noiseinto account Since each of the particles has a different esti-mate of the vehicle position resulting in a different perspec-tive for the measurement each particle is assigned differentweights Particles are resampled every iteration such thatthe lower weight particles are removed and higher weightparticles are replicated This results in a cloud of randomparticles of track towards the best estimation results whichare the positions that yield the best correlation between theprevious position of the features and the new measurementdata

The positions of landmarks are stored by the particlessuch as Par

119899= (119883119879

119871 119875)where119883

119871= (119909119888119894 119910119888119894) and 119875 is the 2times2

covariance matrix for the particular Kalman Filter containedby Par

119899 The 6DOF vehicle state vector 119909V can be updated

in discrete time steps of (119896) as shown in (14) where 119877 =

(119909119903 119910119903 119867)119879 is the position in inertial frame from which the

velocity in inertial frame can be derived as = V119864 The

vector V119861= (V119909 V119910 V119911)119879 represents linear velocity of the

body frame and 120596 = (119901 119902 119903)119879 represents the body angular

rate Γ = (120601 120579 120595)119879 is the Euler angle vector and 119871119864119861

is theEuler angle transformation matrix for (120601 120579 120595) The 3 times 3matrix 119879 converts (119901 119902 119903)119879 to ( 120601 120579 ) At every step theMAV is assumed to experience unknown linear and angularaccelerations 119881

119861= 119886119861Δ119905 andΩ = 120572

119861Δ119905 respectively

119909V (119896 + 1) =(

119877(119896) + 119871119864119861(120601 120579 120595) (V

119861+ 119881119861) Δ119905

Γ (119896) + 119879 (120601 120579 120595) (120596 + Ω)Δ119905

V119861(119896) + 119881

119861

120596 (119896) + Ω

)

(14)

There is only a limited set of orientations a helicopter iscapable of sustaining in the air at any given time withoutpartial or complete loss of control For instance no usefullift is generated when the rotor disc is oriented sidewayswith respect to gravity Moreover the on-board autopilotincorporates IMU and compass measurements in a best-effort scheme to keep the MAV at hover in the absence ofexternal control inputs Therefore we can simplify the 6DOFsystem dynamics to simplified 2D system dynamics with anautopilot Accordingly the particle filter then simultaneouslylocates the landmarks and updates the vehicle states 119909

119903 119910119903 120579119903

described by

xV (119896 + 1) = (cos 120579119903(119896) 1199061(119896) + 119909

119903(119896)

sin 120579119903(119896) 1199061(119896) + 119910

119903(119896)

1199062(119896) + 120579

119903(119896)

) + 120574 (119896) (15)

where 120574(119896) is the linearized input signal noise 1199061(119896) is the

forward speed and 1199062(119896) the angular velocity Let us consider

one instantaneous field of view of the camera in which thecenter of two ground corners on opposite walls is shiftedFrom the distance measurements described earlier we canderive the relative range and bearing of a corner of interest(index 119894) as follows

y119894= h (x) = (radic1199092

119894+ 1199102

119894 tanminus1 [plusmn

119910119894

119909119894

] 120595)

119879

(16)

where 120595 measurement is provided by the infinity-pointmethod

This measurement equation can be related with the statesof the vehicle and the 119894th landmark at each time stamp (119896)as shown in (17) where xV(119896) = (119909

119903(119896) 119910119903(119896) 120579119903(119896))119879 is the

vehicle state vector of the 2D vehicle kinematic model Themeasurement equation h

119894(x(119896)) can be related with the states

of the vehicle and the 119894th corner (landmark) at each timestamp (119896) as given in (17)

h119894(x (119896)) = (

radic(119909119903(119896) minus 119909

119888119894(119896))2

+ (119910119903(119896) minus 119910

119888119894(119896))2

tanminus1 (119910119903(119896) minus 119910

119888119894(119896)

119909119903(119896) minus 119909

119888119894(119896)) minus 120579119903(119896)

120579119903

)

(17)

where 119909119888119894and 119910

119888119894denote the position of the 119894th landmark

41 Data Association Recently detected landmarks need tobe associated with the existing landmarks in the map suchthat each newmeasurement either corresponds to the correctexistent landmark or else registers as a not-before-seenlandmark This is a requirement for any SLAM approach tofunction properly (ie Figure 11) Typically the associationmetric depends on the measurement innovation vector Anexhaustive search algorithm that compares every measure-ment with every feature on the map associates landmarks ifthe newlymeasured landmarks is sufficiently close to an exist-ing oneThis not only leads to landmark ambiguity but also is

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

10 Journal of Electrical and Computer Engineering

computationally intractable for large maps Moreover sincethe measurement is relative the error of the vehicle positionis additive with the absolute location of the measurement

We present a new faster and more accurate solutionwhich takes advantage of predicted landmark locations onthe image plane Figure 5 gives a reference of how landmarksappear on the image plane to move along the ground linesas the MAV moves Assume that 119901119896

(119909119910) 119896 = 0 1 2 3 119899

represents a pixel in time which happens to be contained bya landmark and this pixel moves along a ground line at thevelocity V

119901 Although landmarks often contain a cluster of

pixels size of which is inversely proportional with landmarkdistance here the center pixel of a landmark is referred Giventhat the expectedmaximum velocity119881

119861max is known a pixelis expected to appear at

119901119896+1

(119909119910)= 119891((119901

119896

(119909119910)+ (V119861+ 119881119861) Δ119905)) (18)

where

radic(119901119896+1

(119909)minus 119901119896

(119909))2

+ (119901119896+1

(119910)minus 119901119896

(119910))

2

(19)

cannot be larger than 119881119861maxΔ119905 while 119891(sdot) is a function that

converts a landmark range to a position on the image planeA landmark appearing at time 119896 + 1 is to be associated

with a landmark that has appeared at time 119896 if and onlyif their pixel locations are within the association thresholdIn other words the association information from 119896 is usedOtherwise if the maximum expected change in pixel loca-tion is exceeded the landmark is considered new We savecomputational resources by using the association data from 119896when a match is found instead of searching the large globalmap In addition since the pixel location of a landmark isindependent of the noise in theMAVposition the associationhas an improved accuracy To further improve the accuracythere is also a maximum range beyond which the MAV willnot consider for data association This range is determinedtaking the camera resolution into consideration The farthera landmark is the fewer pixels it has in its cluster thus themore ambiguity and noise it may contain Considering thephysical camera parameters resolution shutter speed andnoise model of the Logitech-C905 camera the MAV is set toignore landmarks farther than 8 meters Note that this is alimitation of the camera not our proposed methods

Although representing the map as a tree based datastructure which in theory yields an association time of119874(119873 log119873) our pixel-neighborhood based approach alreadycovers over 90 of the features at any time therefore a treebased solution does not offer a significant benefit

We also use a viewing transformation invariant scenematching algorithm based on spatial relationships amongobjects in the images and illumination parameters in thescene This is to determine if two frames acquired under dif-ferent extrinsic camera parameters have indeed captured thesame scene Therefore if the MAV visits a particular placemore than once it can distinguish whether it has been to thatspot before

Our approach maps the features (ie corners lines) andillumination parameters from one view in the past to theother in the present via affine-invariant image descriptorsA descriptor 119863

119905consists of an image region in a scene that

contains a high amount of disorder This reduces the proba-bility of finding multiple targets later The system will pick aregion on the image plane with the most crowded cluster oflandmarks to look for a descriptor which is likely to be thepart of the image where there is most clutters hence creatinga more unique signature Descriptor generation is automaticand triggered when turns are encountered (ie Helix BearingAlgorithm) A turn is a significant repeatable event in thelife of a map which makes it interesting for data associationpurposes The starting of the algorithm is also a significantevent for which the first descriptor 119863

0is collected which

helps the MAV in recognizing the starting location if it isrevisited

Every time a descriptor 119863119905is recorded it contains the

current time 119905 in terms of frame number the disorderlyregion 119868

119909119910of size 119909 times 119910 and the estimate of the position and

orientation of the MAV at frame 119905 Thus every time a turnis encountered the system can check if it happened beforeFor instance if it indeed has happened at time 119905 = 119896 where119905 gt 119896 119863

119896is compared with that of 119863

119905in terms of descriptor

and landmarks and the map positions of the MAV at times 119905and 119896 are expected to match closely else it means the map isdiverging in a quantifiable manner

The comparison formulation can be summarized as

119877 (119909 119910) =

sum11990910158401199101015840 (119879 (119909

1015840 1199101015840) minus 119868 (119909 + 119909

1015840 119910 + 119910

1015840))2

radicsum11990910158401199101015840 119879(1199091015840 1199101015840)2

sdot sum11990910158401199101015840 119868(119909 + 119909

1015840 119910 + 1199101015840)2

(20)

where a perfect match is 0 and poor matches are representedby larger values up to 1We use this to determine the degree towhich two descriptors are related as it represents the fractionof the variation in one descriptor that may be explained bythe other Figure 10 illustrates how this concept works

5 Experimental Results

As illustrated in Figures 12 13 and 14 our monocular visionSLAM correctly locates and associates landmarks to the realworld Figure 15 shows the results obtained in an outdoorexperiment with urban roads A 3D map is built by the addi-tion of time-varying altitude and wall positions as shown inFigure 16 The proposed methods prove robust to transientdisturbances since features inconsistent about their positionare removed from the map

The MAV assumes that it is positioned at (0 0 0) Carte-sian coordinates at the start of a mission with the camerapointed at the positive 119909-axis therefore the width of thecorridor is represented by the 119910-axis At anytime during themission a partial map can be requested from the MAV viaInternet The MAV also stores the map and important videoframes (ie when a new landmark is discovered) on-boardfor a later retrieval Video frames are time linked to themap Itis therefore possible to obtain a still image of the surroundings

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

Journal of Electrical and Computer Engineering 11

Figure 10 Data association metric where a descriptor is shown on the middle

0 10 20 30(m)

Figure 11 Map drift is one of the classic errors introduced by poordata association or lack thereof negatively impacting the loop-closing performance

of any landmark for the surveillance and identification pur-poses

In Figure 12 the traveled distance is on the kilometerscale When the system completes the mission and returns tothe starting point the belief is within one meter of where themission had originally started

51 The Microaerial Vehicle Hardware Configuration SaintVertigo our autonomous MAV helicopter serves as theprimary robotic test platform for the development of thisstudy (see Figure 17) In contrast with other prior works thatpredominantly used wireless video feeds and Vicon visiontracking system for vehicle state estimation [24] SaintVertigoperforms all image processing and SLAM computations on-board with a 1 GHz CPU 1 GB RAM and 4GB storageThe unit measures 50 cm with a ready-to-fly weight of 09 kg

0 10 20(m)

300 10 20(m)

30

Figure 12 Experimental results of the proposed ranging and SLAMalgorithm showing the landmarks added to the map representingthe structure of the environment All measurements are in metersThe experiment was conducted under incandescent ambient light-ning

and 09 kg of payload for adaptability to different missionsIn essence the MAV features two independent computersThe flight computer is responsible for flight stabilizationflight automation and sensory management The navigationcomputer is responsible for image processing range mea-surement SLAM computations networking mass storageand as a future goal path planning The pathway betweenthem is a dedicated on-board link throughwhich the sensoryfeedback and supervisory control commands are sharedThese commands are simple directives which are convertedto the appropriate helicopter flight surface responses by theflight computer The aircraft is IEEE 80211 enabled and all

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

12 Journal of Electrical and Computer Engineering

0 10 20 30(m)

0 10 20 30(m)

(a)

(b)

Figure 13 (a) Experimental results of the proposed ranging andSLAM algorithm with state observer odometer trail Actual floor-plan of the building is superimposed later on a mature map toillustrate the accuracy of our method Note that the floor plan wasnot provided to the system a priori (b) The same environmentmapped by a ground robotwith a different starting point to illustratethat our algorithm is compatible with different platforms

0 10 20 30(m)

0 10 20 30(m)

Figure 14 Results of the proposed ranging and SLAM algorithmfrom a different experiment with state observer ground truth Allmeasurements are in meters The experiment was conducted underfluorescent ambient lightning and sunlight where applicable

0(m)50 1000

(m)50 100

Figure 15 Results of the proposed ranging and SLAM algorithmfrom an outdoor experiment in an urban area A small map ofthe area is provided for reference purposes (not provided to thealgorithm) and it indicates the robot path All measurements arein meters The experiment was conducted under sunlight ambientconditions and dry weather

Hallway length (m)

4035 30

25

25

2020

1515

05

10 10

0

5 50 0

Hallway width (m

)

151

minus5

altit

ude (

m)

Heli

copt

er

Figure 16 Cartesian (119909 119910 119911) position of the MAV in a hallwayas reported by proposed ranging and SLAM algorithm with time-varying altitude The altitude is represented by the 119911-axis andit is initially at 25 cm as this is the ground clearance of theultrasonic altimeter when the aircraft has landed MAV altitude wasintentionally varied by large amounts to demonstrate the robustnessof our method to the climb and descent of the aircraft whereas ina typical mission natural altitude changes are in the range of a fewcentimeters

A

B

C

D

Figure 17 Saint Vertigo the autonomous MAV helicopter consistsof four decksTheAdeck contains collective pitch rotor headmecha-nics The B deck comprises the fuselage which houses the powerplant transmission main batteries actuators gyroscope and thetail rotor The C deck is the autopilot compartment which containsthe inertial measurement unit all communication systems andall sensors The D deck carries the navigation computer which isattached to a digital video camera visible at the front

its features are accessible over the internet or an ad hoc TCP-IP network Among the other platforms shown in Figure 18Saint Vertigo has the most limited computational resources

52 Processing Requirements In order to effectively managethe computational resources on a light weight MAV com-puter we keep track of the CPU utilization for the algorithmsproposed in this paper Table 1 shows a typical breakdown ofthe average processor utilization per one video frame Eachcorresponding task elucidated in this paper is visualized inFigure 2

The numbers in Table 1 are gathered after the map hasmatured Methods highlighted with dagger are mutually exclusivefor example the Helix Bearing algorithm runs only when theMAV is performing turns while ranging task is on standbyParticle filtering has a roughly constant load on the system

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

Journal of Electrical and Computer Engineering 13

Figure 18 Our algorithms have been tested on a diverse set of mobile platforms shown here Picture courtesy of Space Systems and ControlsLab Aerospace Robotics Lab Digitalsmithy Lab and Rockwell Collins Advanced technology Center

once the map is populated We only consider a limitedpoint cloud with landmarks in the front detection range ofthe MAV (see Section 41) The MAV typically operates at80ndash90 utilization range It should be stressed that thisnumerical figure includes operating system kernel processeswhich involve video-memory procedures as the MAV is notequipped with a dedicated graphics processor The MAVis programmed to construct the SLAM results and othermiscellaneous on-screen display information inside the videomemory in real time This is used to monitor the system forour own debugging purposes but not required for the MAVoperation Disabling this feature reduces the load and freesup processor time for other tasks that may be implementedsuch as path planning and closed-loop position control

6 Conclusion and Future Work

In this paper we investigated the performance of monocularcamera based vision SLAM with minimal assumptions aswell as minimal aid from other sensors (altimeter only) in acorridor-following-flight application which requires preciselocalization and absolute range measurement This is trueeven for outdoor cases because our MAV is capable of build-ing high speeds and covering large distances very rapidly andsome of the ground robots we have tested were large enoughto become a concern for traffic and pedestriansWhile widelyrecognized SLAM methods have been mainly developedfor use with laser range finders this paper presented newalgorithms formonocular vision-based depth perception and

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 14: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

14 Journal of Electrical and Computer Engineering

Table 1 CPU utilization of the proposed algorithms

Image acquisition and edge filtering 10Line and slope extraction 2Landmark extraction 20dagger

Helix bearing 20dagger

Ranging algorithms Below 1Rao-Blackwellized particle filter 50

bearing sensing to accurately mimic the operation of such anadvanced device We were able to integrate our design withpopular SLAM algorithms originally meant for laser rangefinders and we have experimentally validated its operationfor autonomous indoor and outdoor flight and navigationwith a small fully self-contained MAV helicopter as well asother robotic platforms Our algorithms successfully adapt tovarious situations while successfully performing the transi-tion between (eg turns presence of external objects andtime-varying altitude)

Since the proposed monocular camera vision SLAMmethod does not need initialization procedures the missioncan start at an arbitrary point Therefore our MAV can bedeployed to infiltrate an unknown building One future taskis to add the capability to fly through doors and windowsIndeed the system is only limited by the capabilities of thecamera such as resolution shutter speed and reaction timeAll of those limitations can be overcome with the properuse of lenses and higher fidelity imaging sensors despite wehave used a consumer-grade USB camera Since the ability toextract good landmarks is a function of the camera capabili-ties a purpose-built camera is suggested for futurework Sucha camera would also allow development of efficient visionSLAM and data association algorithms that take advantageof the intermediate image processing data

Our future vision-based SLAM and navigation strategyfor an indoorMAV helicopter through hallways of a buildingalso includes the ability to recognize staircases and thustraversemultiple floors to generate a comprehensive volumet-ric map of the building This will also permit vision-based3D path planning and closed-loop position control of MAVbased on SLAM Considering our MAV helicopter is capableof outdoor flight we can extend our method to the outdoorperimeter of buildings and similar urban environments byexploiting the similarities between hallways and downtowncity maps Further considering the reduction in weight andindependence from GPS coverage our work also permitsthe development of portable navigation devices for a widerarray of applications such as small-scale mobile robotics andhelmet or vest mounted navigation systems

Certain environments and environmental factors provechallenging to our proposed method bright lights reflectivesurfaces haze and shadows These artifacts introduce twomain problems (1) they can alter chromatic clarity localmicrocontrast and exposure due to their unpredictable high-energy nature and (2) they can appear as false objectsespeciallywhen there is bloom surrounding objects in front ofproblem light source Further reduction in contrast is possible

if scattering particles in the air are dense We have come toobserve that preventative and defensive approaches to suchissues are promising Antireflective treatment on lenses canreduce light bouncing off of the lens and programming theaircraft to move for a very small distance upon detection ofglare can eliminate the unwanted effects Innovative andadaptive application of servo-controlled filters before thelenses can minimize or eliminate most if not all reflectionsThe light that causes glare is elliptically polarized due tostrong phase correlation This is as opposed to essential lightwhich is circularly polarized Filters can detect and blockpolarized light from entering the camera thereby blockingunwanted effects Application of purpose designed digitalimaging sensors that do not involve a Bayes filter can alsohelp Most of the glare occurs in green light region andtraditional digital imaging sensors have twice as many greenreceptors as red and blue Bayes design has been inspiredfrom human eye which sees green better as green is themost structurally descriptive light for edges and cornersThispaper has supplementary material (see Supplementary Mate-rial available online at httpdxdoiorg1011552013374165)available from the authors which show experimental resultsof the paper

Acknowledgments

The research reported in this paper was in part supportedby the National Science Foundation (Grant ECCS-0428040)Information Infrastructure Institute (1198683) Department ofAerospace Engineering and Virtual Reality Application Cen-ter at Iowa State University Rockwell Collins and Air ForceOffice of Scientific Research

References

[1] DHHubel and TNWiesel ldquoReceptive fields binocular inter-action and functional architecture in the catrsquos visual cortexrdquoTheJournal of Physiology vol 160 pp 106ndash154 1962

[2] N Isoda K Terada S Oe and K IKaida ldquoImprovement ofaccuracy for distance measurement method by using movableCCDrdquo in Proceedings of the 36th SICE Annual Conference (SICErsquo97) pp 29ndash31 Tokushima Japan July 1997

[3] R Hartley and A ZissermanMultiple View Geometry in Com-puter Vision Cambridge University Press 2nd edition 2003

[4] F Ruffier and N Franceschini ldquoVisually guided micro-aerialvehicle automatic take off terrain following landing and windreactionrdquo in Proceedings of the IEEE International Conferenceon Robotics and Automation pp 2339ndash2346 New Orleans LoUSA May 2004

[5] F Ruffier S Viollet S Amic and N Franceschini ldquoBio-inspired optical flow circuits for the visual guidance of micro-air vehiclesrdquo in Proceedings of the International Symposium onCircuits and Systems (ISCAS rsquo03) vol 3 pp 846ndash849 BangkokThailand May 2003

[6] J Michels A Saxena and A Y Ng ldquoHigh speed obstacle avoid-ance using monocular vision and reinforcement learningrdquo inProceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) vol 119 pp 593ndash600 August 2005

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 15: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

Journal of Electrical and Computer Engineering 15

[7] A Saxena J Schulte and A Y Ng ldquoDepth estimation usingmonocular and stereo cuesrdquo in Proceedings of the 20th inter-national joint conference on Artifical intelligence (IJCAI rsquo07) pp2197ndash2203 2007

[8] N Snavely S M Seitz and R Szeliski ldquoPhoto tourism explor-ing photo collections in 3DrdquoACMTransactions onGraphics vol25 no 3 2006

[9] A W Fitzgibbon and A Zisserman ldquoAutomatic camera recov-ery for closed or open image sequencesrdquo in Proceedings of theEuropean Conference on Computer Vision pp 311ndash326 June1998

[10] ADavisonMNicholas and SOlivier ldquoMonoSLAM real-timesingle camera SLAMrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 29 no 6 pp 1052ndash1067 2007

[11] L Clemente A Davison I Reid J Neira and J Tardos ldquoMap-ping large loops with a single hand-held camerardquo in Proceedingsof the Robotics Science and Systems Conference June 2007

[12] F Dellaert W Burgard D Fox and S Thrun ldquoUsing thecondensation algorithm for robust vision-based mobile robotlocalizationrdquo in Proceedings of the IEEE Computer Society Con-ference onComputer Vision and Pattern Recognition (CVPR rsquo99)pp 588ndash594 June 1999

[13] N Cuperlier M Quoy P Gaussier and C Giovanangeli ldquoNav-igation and planning in an unknown environment using visionand a cognitive maprdquo in Proceedings of the IJCAI WorkshopReasoning with Uncertainty in Robotics 2005

[14] G Silveira E Malis and P Rives ldquoAn efficient direct approachto visual SLAMrdquo IEEE Transactions on Robotics vol 24 no 5pp 969ndash979 2008

[15] A P Gee D Chekhlov A Calway and W Mayol-CuevasldquoDiscovering higher level structure in visual SLAMrdquo IEEETransactions on Robotics vol 24 no 5 pp 980ndash990 2008

[16] K Celik S-J Chung and A K Somani ldquoMono-vision cornerSLAM for indoor navigationrdquo in Proceedings of the IEEE Inter-national Conference on ElectroInformation Technology (EITrsquo08) pp 343ndash348 Ames Iowa USA May 2008

[17] K Celik S-J Chung and A K Somani ldquoMVCSLAM mono-vision corner SLAM for autonomous micro-helicopters in GPSdenied environmentsrdquo in Proceedings of the AIAA GuidanceNavigation and Control Conference Honolulu Hawaii USAAugust 2008

[18] K Celik S J Chung and A K Somani ldquoBiologically inspiredmonocular vision based navigation and mapping in GPS-denied environmentsrdquo in Proceedings of the AIAA Infotech atAerospace Conference and Exhibit and AIAA UnmannedUnli-mited Conference Seattle Wash USA April 2009

[19] K Celik S-J ChungM Clausman andA K Somani ldquoMonoc-ular vision SLAM for indoor aerial vehiclesrdquo in Proceedings ofthe IEEERSJ International Conference on Intelligent Robots andSystems St Louis Mo USA October 2009

[20] J Shi and C Tomasi ldquoGood features to trackrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition pp 593ndash600 June 1994

[21] H Bay A Ess T Tuytelaars and L van Gool ldquoSpeeded-UpRobust Features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008

[22] K Celik and A K Somani ldquoWandless realtime autocalibrationof tactical monocular camerasrdquo in Proceedings of the Interna-tional Conference on Image Processing Computer Vision andPattern Recognition (IPCV rsquo12) Las Vegas Nev USA 2012

[23] M Montemerlo S Thrun D Koller and B Wegbreit ldquoFast-SLAM a factored solution to the simultaneous localization andmapping problemrdquo in Proceedings of the AAAI National Con-ference on Artificial Intelligence pp 593ndash598 2002

[24] J P How B Bethke A Frank D Dale and J Vian ldquoReal-timeindoor autonnomous vehicle test environmentrdquo IEEE ControlSystems Magazine vol 28 no 2 pp 51ndash64 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 16: Research Article Monocular Vision SLAM for Indoor Aerial ... · Battery Manual override Yaw gyroscope Flight surfaces Power plant RAM Inertial measurement unit USB2.0 Orthogonality

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of