Stereo vision and navigation within buildings

Embed Size (px)

Citation preview

  • 8/7/2019 Stereo vision and navigation within buildings

    1/6

    Stereo Vision and Navigation within BuildingsErnst Triendl and David J. Kriegman

    Artificial Intelligence LaboratoryStanford UniversityStanford, CA 94305

    Abstract: Soft modeling, stereo vision, motion planning,uncertainty reduction, image processing, and locomotionenable the Mobile Autonomous Robot Stanford to explorea benign indoor environment without human intervention.Th e modeling system describes rooms in terms of floor,walls, hinged doors and allows for unspecified obstacles.Image processing basically extracts vertica l edges alongthe horizon using an edge appearance model. Stereo vi-sion matches those edges using edge and greylevel similar-ity, constraint propagation and a preference for epipolarordering. The motion planner tries to move in a way thatis likely to increase knowledge about obstacle free space.Results presented are from an autonomous run that in-cluded difficult passages suchas navigation arounda pillarwithout apriori knowledge.

    1. Introduction

    These goals are achieved by a combination of mod-elingandvision.Modelingtells us thatbuildingshavewalls, doors and floors that obey certain relations to eachother and the vehicle. Th e vision system determines theregions of free space and location of obstacles. Th e mo-tion planning system generates moves that are likely toincrease the knowledge about free space so that furthermoves will be possible.

    Examples presented are from the first convincing runon October 18, 1986. Left to its own devices Mobi moveddown the hallway, made a tour of the lobby and camebackinto the hallway, travelinga total distance of 35 meters.

    In t his pa.per we discuss the vision and motion plan-ningalgorithms. More abouttherobot,it sother sen-sors and on odometry correction by vision is contained in[Kriegman 871 and in [Triendl 871.

    Our goal is to developed a stereo vision system thatallows a rob,ot to exploretheinterior of a typicalbuild- 2. Modelinging; a benign environment that is neither rigged for thepurpose,nor filled withtrickyobstacles. We expecttherobottoexploreinsidebuildings.The

    By the end of 1986 the Mobile Autonomous RobotStanford or Mobi w a s able to move relatively freely underits own guidance inside our laboratory building. It is

    base model used to represent knowledge about this en-vironmentconsists of a flatfloor thatcarriesvertical,straight wallswithhingeddoors.Aroomis thespacebetween walls. A hallway is a long and narrow room.

    0 fast enough for slow walking speed,0 able to understand enough about the indoors tomove

    about and recognize the more important elements ofa building.

    0 able to move autonomously, exploring rooms withouthuman intervention.

    Support for this work was provided by the Air Force Officeof Scientific Research under contract F33615-85-C-5106, ImageUnderstanding contract N00039-84-c-0211 and AutonomousLandVehicle contract AIDS-1085s-1. David Kriegman wassupported by a fellowship from the Fannie and John HertzFoundation.

    The actual implementati on of the model allows forsurface marks on the walls and a class Other Objectto cope with things that the stereo vision system sees,modeling cannot explain, but the motion planner shouldnot push over.

    Figure 1 is a typical instantiation of our model thatcontains all possible objects and their joining edges. Com-pare it to the exampleof a stereo pair of images (figure 2 )seen by the robot during its excursion.

    The choice of this model has implications for the vi-sion system: It needs only tolook at vertical edges to gen-erate the mode l. In fact looking for edges at th e horizonsuffices, since all importent vertical edges crossa horizon-

    CH2413-3/87/0000/1725$01.00 0 1987IEEE 1725

  • 8/7/2019 Stereo vision and navigation within buildings

    2/6

    Figure 1 . Possible Instantiation of Model

    Figure 2. Stereo pair of images seen by Mobi while roam-ing through ou r la b

    .Figure 3. Model of Mobi with cameras and field of view.Th e stereo matches are from the imagefigure 2 .

    tal pla ne at c amer a height (otherwise Mobi could not fitthrough doors). One ambiguity remains though: We mayconfuse a gap in the wall with the wall itself unless someother object is seen through the gap. A look at the flooredge might resolve this situation.

    Whether one should look at t he floor at all is alsoa questi on of good usage of processing resources: Whenall information about the model can be gainedby lookinghorizontally, looking down will slow the process. On the

    other hand, looking down occasionally is needed to avoidobstacles on thefloor. Ou r Robot d oes not d oso now andconsequently rams into chairs, flower pots (if small) andcouches.

    A model of the robot , (figure 3) is used for vision andmotion planning. Mobi is an omnidirectional cylindricalvehicle, 17 0 cm high, 65 cm in diameter. It ha s 12 touchsensitive bumpers, and carriestwo cameras (17 cm apart)which have a 36 degree field of view.

    3 . Edge DetectionA verticaledgedetectorwith anaperture of 5

    columns by 10 rows proceeds in two stages:First a 1 by 10 vertical averaging filter is applied to

    both images at th e horizon, which is known from a cali-bration program that measures camera orientations. Thisvertica l smeari ng has the following effects:

    1. Vertical edges retain their acuity.2. Slanted edges get blurred, horizontal edges vanish.3 . Tilt and roll angle misalignment have less effect.4. Image noise is reduced.5 . Blobs become vertical edges.

    Second a 5 by 1 version of th e edge appearance mod elis applied to the filtered image line. The edge appearancemodel [Triendl 19781 compares a local patch of image tothe image thatwould have been creat,edif the camer awerelooking at an idea l step edge. It uses thespatialfiltercreated by th e lens-camera-digitizer-preprocessor pipelinefor this purpose. The operator returns quality, position,direction, left and right greylevels and an estim ateof thelocalization error (1/8 pixel for good edges).Alternative Edge Extractions

    Before arriving at the abov e solu tionwe explored sev-eral alternatives. Some of them are to be appli ed laterformore specific purposes such as looking for a floor edge.

    Detecting all edges in the imag e first and then link-ing them into lines took several minutes per stereo pair.Combining edge detection and linkage improved to speedto about 30 seconds, mostly spent for the search for newedge links. Reducing this search risks loosing short an dmarginal chainsof edges.

    The resulting lines were labeled straight and b entandconnectedwithotherlinesformingcorners, T-junction,Y-junctionsandarrows.Thesewereintendedto label the resul ting line-graph according to th e modeland combine monocular and binocular stereo. Againpro-cessing time wastoo long, and in addition lines were eas

    1726

  • 8/7/2019 Stereo vision and navigation within buildings

    3/6

    broken by door knobs, labels on thewall and the likes, sothat stereo pairs weredifficult to determine . Many realobjects did not give sufficiently good and consistent edgesfor a stereo match.

    Non-vert,ical lines that fit the model, i.e.floor-walland floor-door edges, are often found outside the field ofview of th e came ra, too low incontrast,illdefined, orobstructed by furniture.Whenextractingverticallinesthe image can be enhanced by application of a verticallow pass filter, a moving average over a few lines. Theline followerwont getthrown off by a tackin a doorframe. All important informat ion, except edge length, isprovided by the first edge of a vertical. So finally we dropangular sensitivityof the edge detector and linefollowingand arrive at our present solution.

    4. Stereo AlgorithmCertainty about a stereo match is not possible, sincewe can find a possible if far fetched physical explanation

    for any match. Even in the series of pictures from ou r testrun discussed below, we find matching edges that appearvery different. We do tap all available sources of informa-tion and cues to make the chance of correct matches inthe real world as high as possible. See [Baker 19821 and[Tsu ji 19861 for other solu tions.

    Thestereomechanism we finally sett led onusesedges, grey levels, correlationof intensities, constellationsof edges and constrain t propagation. It tries to preserveleft to right ordering of edge matches but allows viola-tions, e.g. by a pillar in the middle of a room. Multiplechoices of matches are kept alongif needed.Th e first stage of the matcher proposes all possiblematches of pairs edges that are similaron the left or rightside of the edge or have a similar grey level curve. Notethat this includes matches of occluding edges with differ-ing background. Edges and their potential matches super-imposed on the grey level curves of left and right epipolarline of the image are shown in figure 4.

    The grey level comparison function that we use issimilar to the normalized cross correlation but takes intoaccount differences in standard devia tions , me angrey lev-els and interval lengths.

    Figure 4 , Stereo match proposals and grey level curvesNext we deal with local consistency. First the grey-

    level-comparison-function is applied to the intervals be-tween pairs of matching edges and their respective first

    and second neighbors to the left . By comparing these 4combinations per match, marginal edges present in onlyone image can be pinpointed and eliminated. As a sideeffect, high correlation makes itlikely that the ribbon be-tween edges results from a solid object, in our model awall or a closed door (see below),

    Localconsistencylinksneighborsanddetermineswhich neighboring ma.tch is more consistent . Theseneigh-bor links form paths of maximalconsistencythatlinkgroups of likely match choices. Simply summing up matchqualities in a group represents a n effective means of con-straint propagation. A group of equal bars or a checker-board that is entirelyvisible will be matched correctly. Inthe case of an occluding edge, the constraint will pr o pagate up to the edge,and thepart visible only to onecamera will be left unmatched.

    Edge detector, stereo proposer and greylevel compar -ision function are implemented in C and run on a VAX.Processing time is about one second per stereo pair. Dataarethensentto a Symbolics 3600 lispmachine whichmakes the final matching decisions shown as littl e circlesinfigure 5 . All matchesintheexamplehappentobeunique, but motion matches, if available,wouldresolveany remaining ambiguity.

    5. Modelmaking: Spines, Walls, DoorsLooking at the ribbons formed by pairs of neighbor-

    ing edges, we tentatively call wall a ribbon whose ap-pearance is simi lar in the left and right image (see fig 5) .The directi on angles of these prospective walls are clus-tered withweights proportional to thei r lengths. The mostprominent angle will be the angleof the main walls in thescene.

    After creating an empty model the wall spines areadded. A wall spine is an unbounded straight line (Le.vertical plane in 3D) through the center of walls. In thepresent example2 spines have been found along the x-axisin positions 50cm to the left and 93cm to the right. Theymight be called lef twall an d rightwall.

    Potential walls and doors are added to the model ift.hey are close to a spine. The model created SO far fromthe sample image has two wall spines with 3 doors andseveral wall slabs (fig 6).

    A dooristheribbonbetween two not necessarilyneighboring matches that (1) are 60 to 140 cm apar t, (2)have edges with big brightness difference, (3 ) are dark onthe inside, bright on the outside, and vice versa, (4 ) canbe associated witha wall spine. If the ribbons look similarin both images the y are called closed-door, if they lookvery different they are labelled open-door otherwise justdoor.

    1727

  • 8/7/2019 Stereo vision and navigation within buildings

    4/6

    Figure 5 Mobis footprint and matches bound free space. The labels read wall door closed-doorare pu t by the model proposer. Derived from image fig. 2 and partially shown in fig 3.

    Figure 6. Model derived from image figure 2. The narrowrectangles are doors.

    . Six one meter steps of Mobi later the model contains5 door s and several piecesof wall on two spines, as shownin figure 7.

    2Figure 8. Modelseenfrom theintzrmediateposition.Most of lies outside the ret ina of mobis real camera.

    Figure 9. Picture seen from the intermediate position . +

    c

    Figure 7 Model after 6 steps. Th e mobi icons represent first and last step. The gap in the left spineis mostly real (it did not see the pillar there yet). The gap in the right series of walls is where aglass covered display case hides the wall.

    1728

  • 8/7/2019 Stereo vision and navigation within buildings

    5/6

    6. Free Space for Motion PlanningFree space is the volume,or in our case, thefloor area,

    th at is known to be free from obstacles. Init,ially it is theare a occupied by the vehicles footprint. A safe move willkeep the vehicle entirely within free space.

    The ra ys between both cameras and the object gothrough free spaceif the match was correct a ndno windowpanes or mir ror s are involved. For modeling we have as-sumed that certain ribbonsbetween neighboring matchesrepresent solid objects. For motion planning we assumenow that all thos e ribbons a re solid, to be o n the safe si de.Th e hull of lines connecting neighboring matches and l inesconnecting matches with each camera represents our vi-sual free space. Looking at thi s closed polygon in figure5 (pa rt of it is cut of by the page layout) we noti ce thatno safe move is possible from a single view since there isa bottleneck between footprint and visual free space.

    If the robot has already moved, the new free spaceconsists of the suDerDosition of the new visual free mace~~ ~ . Ito the previous visual free space and the space swept outduring the motion. Old freespace has to be shrunk ac-cording to motion uncertainty.

    The strategyof the motion planner implemented is togenerate enough free space to allow motion and to makethe vehiclecover groundwithouttoomanydeviationswithout getting cornered of cau ght in to re petitive moves.It moves the vehicle into the middleof free space andlook-ing towards the middle of free space ahead. Smaller stepsresult when obstacles are closeby. If free space bifurcates,i.e. there are several middles,a random choice is made. Ifthere is no free space,or no space tolook at , Mobi rotat esuntil it finds space to move.

    7. Autonomous ExplorationNow, let us take a look at what a real mobile robot

    runislike.Tomakethestartinteresting, we pointedmobi at a blank wall andcommandedit to go. It didnot see any edges while staring at a white wall, and per-formedtheonlysafemove,rotation.Afterthis move,seen enough to move forward and rotates further until itis looking down the hallway (figure 2) and sees enoughcorrespondence points to build a large enough region offree space to begin translating. Mobi travels towards theend of hallway without any further incidents bu ilding upthe model shown previously (figures 5 to 7).

    Figure 11 Motion series: Mobi passes the p illar and t urnsfrom the hallway into the lobby.

    1729

  • 8/7/2019 Stereo vision and navigation within buildings

    6/6

    Th e first challenge is the white pillar visible in thefirst image of th e sequence figure 11. This poses a partic-ularly interesting problem because the epipolar orderingconstraintmaybeviolated.Furthermore,thebuildingcorner causes a majo r occlusion of th e window. Note thewhite signs on the window are visible in the right imageand are occluded in the left imag e. The occluding edgechanges from light grey to black in the left image whileswitching from grey to bright white in the right image.Also note theblack wire danglingclose to the corner. Thisthe real world after all. The robot still seesa large regionof free space and heads towards the water fountain. Fi-nally, mobi passes the pillar, gets close to the end of th ehallway, seeslittle andseeing th e reflections of lobby headsstraight toward thewindow. Fortunately for us Mobi seesthe edges of the paper signs. Mobi rotates again to avoidbumping the window.

    Th e excursion continued making similar turnsa t thenext window. Sadly, the run ended because of a commu-nication failure between the lisp machine and the robotperm itting the researchers and onlookers to retire for th eevening after celebrating over champagne.

    8 . ConclusionSo, Mobi can successfully navigate throug h the inside

    of a building under automatic visual control, while creat-ing its own symbolic model of the bu ilding structure. Bychoosing a model that can be simply instantiated with thedetection of vertical edges in the world, processing timehas been greatly reduced.

    AcknowledgementsWed like to thank Tom Binford, SoonYao Kong, Ron

    Fearing, Giora Gorali, S h a d Fish man, Leonie Dreschler-Fischer, Rami Rise and Rami Rubensteinfor all their helpthroughout this work.

    References1. Triendl,Ernst;Kriegman,David J . ; Binford,TomA Mobile Robot: Sensing, Blaming andLocomo-

    tion Proc. IE EE Int. Con f. Robotics & Aulomal ion,1987.

    2. Triendl, Ernst; Kriegman, DavidJ. Vision and VisualExplorationfor the Stanford MobileRobot Proc. Im-a g e UnderslandingWorkshop 1987.

    3. A.R. de Saint Vincent, A 3D Perception System forthe Mobile Robot Hilare, Proc. IEEEInt.Conf .Robotics & Aulomai ion, 1986.

    4. S. Tsujietal., Stereo Vision for a MobileRobot:World Constraintsfor Image Matching andhterpre-tation, Proc. I EE E In?.Conf. Robotics & Automa-l ion, 1986.

    5. A.M. Waxman et al., A Visual Navigation System,Proc. IEEEInt .Conf .Rolof ics 63 Automat ion,1986.

    6 . Brooks, Rodney A . , Visual Map Making far a MobileRobot, Proceedings IEEEInternallanal Conferenceon Robotics and Automation, 1985.

    7 . Chaiila, Raja; Laumond, Jean-Paul, Position Refer-encing a d Consistent World Modeling for MobileRobots Proc. IEEE Inl .Conf. Robolics & Automa-t ion, 1985.

    8. W.P. Moravec, The StanfordCartand the CMURover, Proc. I E E E , vol. 71, no . 7 , July 1983.

    9. H. Baker, Depth from Edge and Intensity BasedStereo AIM-347 , Stanford University, 1982.

    10. R.A. Brooks, SymboficReasoningamong 3-D Modelsand 2-D Images Ph.D. dissertaiion, Stanford Univer-sity, 1981.

    11. E . Triendl, Modellierung von Kanten bei megel-m3Biger Rastermng in Bildverarbeilung und Muster-erkennung, E. Triendl (ed), Sprin ger, Berlin, 1978.

    12. -, How t o get the Edge into the Map Proc. 4th In -lernationalConferenc e on Paltern Recognition, Ky-oto, 1078.

    1730