Application of Visual Pattern Recognition to Robotics and ...lowe/papers/06robotics.pdf · The robot was driven along a reference path (this path is unknown to the SLAM algo-rithm)

Application of Visual Pattern Recognition to Robotics and Automation

© EYEWIRE

BY MARIO E. MUNICH, PAOLO PIRJANIAN, ENRICO DI BERNARDO, LUIS GONCALVES,NIKLAS KARLSSON, AND DAVID LOWE

1070-9932/06/$20.00©2006 IEEEIEEE Robotics & Automation Magazine SEPTEMBER 200672

The first IEEE-IFR joint forum on Innovation andEntrepreneurship in Robotics and Automation tookplace on 10 April 2005 in Barcelona, Spain. The fol-lowing article was presented at the forum as one ofthe three nominees selected for the Innovation andEntrepreneurship in Robotics and AutomationAward. For further information, please visithttp://teamster.usc.edu/∼iera05/index.htm.

Hadi Moradi Associate Vice President

RAS Industrial Activity Board

Visual recognition of patterns is a basic capabilityof most advanced species in nature. A big per-centage of the human brain is devoted to visualprocessing, with a substantial portion of thisarea used for pattern recognition (see references

[1] and [2] for an in-depth treatment of the subject). Visualpattern recognition enables a variety of tasks, such as objectand target recognition, navigation, and grasping and manip-ulation, among others. Advances in camera technologyhave dramatically reduced the cost of cameras, makingthem the sensor of choice for robotics and automation.Vision provides a variety of cues about the environment(motion, color, shape, etc.) with a single sensor. Visual pat-tern recognition solves some fundamental problems incomputer vision: correspondence, pose estimation, andstructure from motion. Therefore, we consider visual

SEPTEMBER 2006 IEEE Robotics & Automation Magazine 73

pattern recognition to be an important, cost-efficient prim-itive for robotics and automation systems.

Recent advances in computer vision have given rise to arobust and invariant visual pattern recognition technology thatis based on extracting a set of characteristic features from animage. Such features are obtained with the scale invariant fea-ture transform (SIFT) [5], [6], which represents the variationsin brightness of the image around the point of interest.Recognition performed with these features has been shownto be quite robust in realistic settings. This article describesthe application of this particular visual pattern recognitiontechnology to a variety of robotics applications: object recog-nition, navigation, manipulation, and human-machine inter-action. The following sections describe the technology inmore detail and present a business case for visual patternrecognition in the field of robotics and automation.

Visual Pattern Recognition (ViPR)The visual pattern recognition system developed by EvolutionRobotics is versatile and works robustly with low-cost cam-eras. The underlying algorithm addresses an important chal-lenge of all visual pattern recognition systems: performingreliable and efficient recognition in realistic settings with inex-pensive hardware and limited computing power.

ViPR can be used in applications such as manipulation,human-robot interaction, and security. It can be used inmobile robots to support navigation, localization, map-ping, and visual servoing. It can also be used in machinevision applications for object identification and hand-eyecoordination. Other applications include entertainmentand education since ViPR enables the automatic identifi-cation and retrieval of information about a painting orsculpture in a museum.

Based on the work of David Lowe [5], [6], ViPR repre-sents an object as a set of SIFT features extracted from one ormore images of such an object. SIFT features are highly local-ized visual templates, which are invariant to changes in scaleand rotation and partially invariant to changes in illuminationand viewpoint. Each SIFT feature is described by its locationin the image (at subpixel accuracy), its orientation, scale, and akeypoint descriptor that has the above-mentioned invarianceproperties. The key components of ViPR are both the partic-ular choice of features to be used (SIFT features) and the veryefficient way of organizing and searching through a databaseof hundreds of thousand of SIFT features.

On a 640 × 480 image, the detector would typicallyfind about 2,000 such features. In the training phase, thefeatures are added to the model database and labeled withthe object name associated with the training image. In thematching phase, a new image is acquired, and the algorithmsearches through the database for all objects matching sub-sets of the features extracted from the new image using abottom-up approach. Euclidean distance in the SIFTdescriptor space is used to find similar features. A greedyversion of a k-d tree allows for efficient search in this veryhigh-dimensional space. A voting technique is then used to

consolidate information coming from individual featurematches into a global match hypothesis. Each hypothesiscontains features that match the same object and the samechange in viewpoint for that particular object. The last stepof the process refines the possible matches by computing anaffine transformation between the set of features and thematched object(s) so that the relative position of the fea-tures is preserved through the transformation. The residualerror after the transformation is used to decide whether toaccept or reject the match hypothesis.

Figure 1 shows the main characteristics of the recognitionalgorithm. The first row displays the two objects to be recog-nized, and the other rows present recognition results underdifferent conditions. The main characteristics of the algorithmare summarized in the following.

Invariant to Rotation and Affine TransformationsViPR recognizes objects even if they are rotated upside down(rotation invariance) or placed at an angle with respect to theoptical axis (affine invariance). See the second and third rowsof Figure 1.

Invariant to Changes in ScaleObjects can be recognized at different distances from thecamera, depending on the size of the objects and the cameraresolution. Recognition works reliably from distances of sev-eral meters. See the second and third rows of Figure 1.

Invariant to Changes in LightingViPR handles changes in illumination ranging from natural toartificial indoor lighting. The system is insensitive to artifactscaused by reflections or backlighting. See the fourth row ofFigure 1.

Invariant to OcclusionsViPR reliably recognizes objects that are partially blocked byother objects and objects that are partially in the camera’sview. The amount of allowed occlusions is typically between50–90%, depending on the object and the camera quality. Seethe fifth row of Figure 1.

Reliable RecognitionViPR has an 80–95% success rate in uncontrolled settings. A95–100% recognition rate is achieved in controlled settings.

Recent advances in computer visionhave given rise to a robust and

invariant visual pattern recognitiontechnology based on extracting a

set of characteristic features from an image.

The recognition speed is a logarithmic func-tion of the number of objects in the database;i.e., the recognition speed is proportional to log(N), where N is the number of objects in thedatabase. The object library can store hundredsor even thousands of objects without a signifi-cant increase in computational requirements.The recognition frame rate is proportional toCPU power and image resolution. For exam-ple, the recognition algorithm runs at 14–18fps (frames per second) at an image resolutionof 208 × 160 on a 1,400-MHz Pentium IVprocessor, 5 fps at 208 × 160 on a 600-MHzMIPS-based 64-b RISC processor and 7 fps at320 × 240o n a 400-MHz processor.

Reducing the image resolution decreasesthe image quality and, ultimately, the recogni-tion rate. However, the object recognitionsystem allows for graceful degradation withdecreasing image quality. Each object modelrequires about 40 kB of memory.

The ViPR algorithm was initially developedat the laboratory of Prof. David Lowe at theUniversity of British Columbia. EvolutionRobotics faced the challenge of productizingand commercializing ViPR. The first stage inthe process was the implementation of ViPR asa reliable piece of code. A first level of qualityassurance (QA) was performed at this stage. QAwas designed in order to ensure that the algo-rithm performed robustly and reliably in com-pletely unknown and unstructuredenvironments. The next stage was documenta-tion and usability. A powerful, yet simple to useset of APIs was designed for ViPR, sample codeexamples and documentation were written, anda second level of QA was performed. At thispoint in time, a graphical user interface (GUI)application was developed in order to simplifycustomer evaluation of ViPR. Optimization ofthe code was also started at this point; weachieved about a 15 times improvement in com-putation while keeping the same recognitionperformance. This optimization of ViPRinvolved a lot of code reorganization, assemblyimplementation of the most time-consumingportions of the code, and modifications of theoriginal algorithm. The final stage of the processhas been the delivery and integration of the codein actual robotics and automation products.

A Business Case for ViPRin Robotics and AutomationIn this section we analyze the different aspectsof ViPR that make it a significant innovationfor the robotics industry.

IEEE Robotics & Automation Magazine SEPTEMBER 200674

Figure 1. Examples of ViPR working under a variety of conditions. The first rowpresents the two objects to be recognized. The other rows display recognitionresults [the bounding box of the matched object is shown in red (dark gray)].

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

NoveltyThe ViPR algorithm presented in the previous section is thefirst algorithm to date that has shown recognition rates above90% in real-world conditions (undefined and unknown envi-ronments, variable lighting, etc.). ViPR also provides the basisfor the visual simultaneous localization and mapping(vSLAM) system [3], [4], the first simultaneous localizationand mapping (SLAM) system that enables low-cost and robustnavigation in cluttered and populated environments. SLAM isone of the most fundamental, yet most challenging, problemsin mobile robotics. To achieve full autonomy, a robot mustpossess the ability to explore its environment without userintervention, build a reliable map, and localize itself in themap. In particular, if global positioning sensor (GPS) data andexternal beacons are unavailable, the robot must determineautonomously what are appropriate reference points (or land-marks) on which to build a map. ViPR provides the capabilityfor robust and reliable recognition of visual landmarks.

Figure 2 shows the result of vSLAM after the robot has trav-eled in a typical two-bedroom apartment. The robot was drivenalong a reference path (this path is unknown to the SLAM algo-rithm). The vSLAM algorithm builds amap consisting of landmarks markedwith blue circles in the figure. The cor-rected robot path, which uses a combi-nation of visual features and odometry,provides a robust and accurate positiondetermination for the robot as seen bythe red path in the figure.

The green path (odometry only) isobviously incorrect, since, accordingto this path, the robot is traversingthrough walls and furniture. The redpath (the vSLAM corrected path), onthe other hand, is consistently fol-lowing the reference path.

Based on experiments in varioustypical home environments on differentfloor surfaces (carpet and hardwoodfloor), and using different image reso-lution (low: 320 × 280, high:640 × 480), robot localization accuracywas achieved as shown in Table 1.

Potential for CommercializationViPR has already been integrated intoa series of commercial products and prototypes. ViPR is alsopart of the Evolution Robotics Software Platform (ERSP)and is being evaluated by a number of companies at themoment. Some commercial products that currently use ViPRare presented in Figure 3 and highlighted in the following.

Sony’s AIBO ERS-7AIBO is an entertainment robot that has incorporated ViPRfor supporting two critical features: reliable human-robotinteraction and robust self-charging. AIBO recognizes a set of

commands issued with a set of predefined, feature-rich cardsthat are reliably matched with ViPR. AIBO’s charging stationhas a characteristic pattern that is recognized with ViPR, sup-porting localization of the station and robust self-docking.

Yaskawa’s SmartPalSmartPal [7] is a prototype robot that is capable of recognizinga set of objects placed on a table understanding a user request


Figure 2. Example result of SLAM using vSLAM. Red path: vSLAM estimate of robottrajectory. Green path: odometry estimate of robot trajectory. Blue circles: vSLAMlandmarks created during operations.

−100

0

0 200 400 600 800

100

200

300

400

500

cm

cm

Image Median Std Dev Median Std DevFloor Resol Error Error Error ErrorSurface Ution (cm) (cm) (deg) (deg)Carpet Low 13.6 9.4 4.79 7.99Carpet High 12.9 10 5.97 7.03HW Low 12.2 7.5 5.28 6.62HW High 11.5 8.3 3.47 5.41

Table 1. Robot localization accuracy.

The key components of ViPR areboth the particular choice of

features to be used and the veryefficient way of organizing and

searching through a database of hundreds of thousand

of SIFT features.

for one of the objects and then grabbing it. SmartPal usesViPR for object recognition and distance calculation in orderto perform manipulation and grasping of the object.

Phillips’ iCatThe iCat [8], [9] is an experimentation platform for study-ing human-robot interaction. The iCat uses ViPR to rec-ognize cards with a par ticular picture/symbol thatrepresents a particular genre of music (e.g., rock, pop,romantics, or children’s music). Once the type of music isrecognized, the iCat selects music of the desired genre

from a content server (PC) and sendsit to a music renderer (Phil ipsStreamium device).

Evolution Robotics’ LaneHawkLaneHawk is a ViPR-based systemdeveloped for the retail market thathelps in detecting and recognizingitems placed in the bottom of thebasket of shopping carts. A cameraflush-mounted in the checkout laneis continuously checking for thepresence of the cart and for the exis-tence of items in the bottom of thebasket. When ViPR recognizes anitem, its UPC information is passedto the point of sale (POS). Thecashier verifies the quantity of itemsthat were found under the basket andcontinues to close the transaction.LaneHawk provides a solution for theproblem of bottom-of-basket (BOB)losses (items that are overlooked orforgotten at checkout) and alsoimproves productivity by increasingthe checkout speed of the lanes.

Economic ViabilityViPR requires two pieces of hard-ware to work: a camera and a CPU.ViPR is able to perform at recogni-tion rates better than 90% using aUS$40–80 Web camera or a US$4–9sensor (640 × 480 pixels resolution).We have implemented ViPR in avar iety of computing platforms,ranging from Pentium-based systemsto embedded processors, like thePMC Sierra family of processors, toDSP chips, like the 200-MHz TITMS320C6205. A complete ViPRsystem composed of a custom cam-era, a TI DSP, and a FPGA to inter-face the camera and the DSP willcost of about US$21 (US$9 + US$5

IEEE Robotics & Automation Magazine SEPTEMBER 200676

Figure 3. Robots using ViPR: (a) Sony’s AIBO ERS-7, (b) Yaskawa’s SmartPal, and (c)Phillips’ iCat.

(a) (b)

(c)

To achieve full autonomy, a robotmust possess the ability to exploreits environment without userintervention, build a reliable map,and localize itself in the map.

+ US$7) in large volumes. We have also implementedvSLAM in two different platforms: a Pentium-based systemand a custom-embedded board composed of a 400-MHzPMC Sierra Processor and a TI DSP that handled theimage-filtering blocks of ViPR. The embedded implemen-tation of vSLAM was developed for the robotic vacuumcleaner market, targeting vacuum devices that would costabout US$500. The embedded board also includes a mod-ule for motor control of the robot for about US$100 inlarge volumes. The embedded implementation of vSLAMprovides all the computing power needed for the vacuumcleaner at a reasonable percentage of the estimated retailprice of the product. Therefore, we have a suite of imple-mentations of ViPR that would enable robotic applica-tions, ranging from the ones that already have a sizablecomputing platform, such as AIBO, to the ones that have aminimal computing platform (or none), such as simplerobotic toys or robotic vacuum cleaners.

On the industrial automation side, we have fully pro-ductized LaneHawk. The system consists of a camera, alighting system, and a complete computer, and it sells forabout US$3,000. Losing about US$20 per lane per day in atypical store (10–15 lanes) represents US$50,000 of annuallost revenue. A LaneHawk installation would pay for itselfin less than 12 months, assuming a conservative BOBdetection rate of 50%.

ConclusionsViPR is a basic primitive for robotics systems. It has beenused for human-robotic interaction, localization and map-ping, navigation and self-charging, and manipulation. ViPRis the first algorithm of its kind that performs robustly andreliably using low-cost hardware. ViPR could be integratedas an add-on hardware component in its DSP implementa-tion or as a pure software component, giving roboticsdevelopers a variety of possibilities and great flexibility fordesigning their applications. Therefore, we postulate thatViPR could potentially be a very valuable component forthe robotics industry.

KeywordsVisual pattern recognition, SIFT, vSLAM, SLAM, vision-based robotics, vision-based automation.

References[1] F. Fang and S. He, “Cortical responses to invisible objects in the human

dorsal and ventral pathways,” Nature Neurosci., vol. 8, no. 10, pp.1380–1385, 2005.

[2] D.J. Felleman and D.C. Van Essen, “Distributed hierarchical processing

in the primate cerebral cortex,” Cerebral Cortex, vol. 1, no. 1, pp. 1–47,1991.

[3] L. Goncalves, E. Di Bernardo, D. Benson, M. Svedman, J. Ostrowski,N. Karlsson, and P. Pirjanian, “A visual front end for simultaneouslocalization and mapping,” in Proc. Int. Conf. Robotics Automation(ICRA), 2005, pp. 44–49.

[4] N. Karlsson, E. Di Bernardo, J. Ostrowski, L. Goncalves, P. Pirjanian,and M. Munich, “The vSLAM algorithm for robust localization andmapping,” in Proc. Int. Conf. Robotics Automation (ICRA), 2005, pp.24–29.

[5] D. Lowe, “Local feature view clustering for 3-D object recognition,” inProc. 2001 IEEE Conf. Computer Vision Pattern Recognition, vol. 1, 2001,pp. 682.

[6] D. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.

[7] K. Matsukuma, H. Handa, and K. Yokoyama, “Vision-based manipula-tion system for autonomous mobile robot ‘smartpal’,” in Proc. JapanRobot Association Conf., Yaskawa Electric Corporation, Sept. 2004.

[8] A.J.N. van Breemen, icat community Web site [Online]. Available:http://www.hitech-projects.com/icat

[9] A.J.N. van Breemen, “Animation engine for believable interactive user-interface robots,” in Proc. Int. Conf. Intelligent Robots Systems (IROS),vol. 3, pp. 2873–2878, 2004.

Mario E. Munich is vice president of engineering and princi-pal scientist at Evolution Robotics. He holds a Ph.D. in electri-cal engineering from the California Institute of Technology.

Paolo Pirjanian is president and chief technology officer atEvolution Robotics. He holds a Ph.D. from the Universityof Aalborg, Denmark.

Enrico Di Bernardo is a senior research scientist at Evolu-tion Robotics. He holds a Ph.D. in electrical engineeringfrom the University of Padova, Italy.

Luis Goncalves is vice president of resesarch and develop-ment at Evolution Robotics Retail. He holds a Ph.D. in com-putational and neural systems from the California Institute ofTechnology.

Niklas Karlsson is a senior research scientist at EvolutionRobotics. He holds a Ph.D. in control systems from the Uni-versity of California, Santa Barbara.

David Lowe is a professor in the Computer Science Depart-ment of the University of British Columbia. He is a Fellow ofthe Canadian Institute for Advanced Research and a memberof the Scientific Advisory Board of Evolution Robotics.

Address for Correspondence: Mario E. Munich, 130 W. UnionSt., Pasadena, CA 91103 USA. Phone: +1 626 535 2871. E-mail: [email protected].


Documents

Application of Visual Pattern Recognition to Robotics and ...lowe/papers/06robotics.pdf · The robot was driven along a reference path (this path is unknown to the SLAM algo-rithm)