694 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …bebis/VelDetReviewPAMI.pdf · 2006-03-23 · illumination conditions (Fig. 1c), unpredictable interaction between traffic participants,

On-Road Vehicle Detection: A ReviewZehang Sun, Member, IEEE, George Bebis, Member, IEEE, and Ronald Miller

Abstract—Developing on-board automotive driver assistance systems aiming to alert drivers about driving environments, and

possible collision with other vehicles has attracted a lot of attention lately. In these systems, robust and reliable vehicle detection is a

critical step. This paper presents a review of recent vision-based on-road vehicle detection systems. Our focus is on systems where

the camera is mounted on the vehicle rather than being fixed such as in traffic/driveway monitoring systems. First, we discuss the

problem of on-road vehicle detection using optical sensors followed by a brief review of intelligent vehicle research worldwide. Then,

we discuss active and passive sensors to set the stage for vision-based vehicle detection. Methods aiming to quickly hypothesize the

location of vehicles in an image as well as to verify the hypothesized locations are reviewed next. Integrating detection with tracking is

also reviewed to illustrate the benefits of exploiting temporal continuity for vehicle detection. Finally, we present a critical overview of

the methods discussed, we assess their potential for future deployment, and we present directions for future research.

Index Terms—Vehicle detection, computer vision, intelligent vehicles.

�

1 INTRODUCTION

EVERY minute, on average, at least one person dies in avehicle crash. Auto accidents also injure at least 10 million

people each year, two or three million of them seriously. It ispredicted that the hospital bill, damaged property, and othercosts will add up to 1-3 percent of the world’s gross domesticproduct [1], [2]. With the aim of reducing injury and accidentseverity, precrash sensing is becoming an area of activeresearch among automotive manufacturers, suppliers anduniversities. Several national and international projects havebeen launched over the past several years to investigate newtechnologies for improving safety and accident prevention(see Section 2).

Vehicle accident statistics disclose that the main threats adriver is facing are from other vehicles. Consequently,developing on-board automotive driver assistance systemsaiming to alert a driver about driving environments andpossible collision with other vehicles has attracted a lot ofattention. In these systems, robust and reliable vehicledetection is the first step. Vehicle detection—and tracking—has many applications including platooning (i.e., vehiclestraveling in high speed and close distance in highways), stopand go (vehicles traveling in low speeds and close distance incities), and autonomous driving.

This paper presents a review of recent vision-based on-road vehicle detection systems where the camera ismounted on the vehicle rather than being fixed such as intraffic/driveway monitoring systems. Vehicle detectionusing optical sensors is very challenging due to hugewithin class variabilities in vehicle appearance. Vehicles

may vary in shape (Fig. 1a), size, and color. The appearanceof a vehicle depends on its pose (Fig. 1b) and is affected bynearby objects. Complex outdoor environments (e.g.,illumination conditions (Fig. 1c), unpredictable interactionbetween traffic participants, cluttered background (Fig. 1d)are difficult to control. On-road vehicle detection alsorequires faster processing than other applications since thevehicle speed is bounded by the processing rate. Anotherkey issue is robustness to vehicle’s movements and drifts.

More general overviews on various aspects of intelligenttransportationsystems(e.g., infrastructure-based approachessuch as sensors detecting the field emitted by permanentmagnetic markers or electric wires buried in the road) as wellas vision-based intelligent transportation systems (e.g., drivermonitoring, pedestrian detection, sign recognition, etc.) canbe found in [2], [3], [4], [5], [6], [7]. Several special issues havealso focused on computer vision applications in intelligenttransportation systems [8], [9], [10], [11].

This paper is organized as follows: In Section 2, wepresent a brief introduction of vision-based intelligentvehicle research worldwide. A brief review of active andpassive sensors is presented in Section 3. Detailed reviewsof Hypothesis Generation (HG) and Hypothesis Verification(HV) methods are presented in Sections 5 and 6 whileexploring temporal continuity by integrating detection withtracking is discussed in Section 7. In Section 8, we provide acritical overview of the HG and HV methods reviewed.Challenges and future research directions are presented inSection 9. Finally, our conclusions are given in Section 10.

2 VISION-BASED INTELLIGENT VEHICLE RESEARCH

WORLDWIDE

Vision-based vehicle detection for driver assistance hasreceived considerable attention over the last 15 years. Thereare at least three reasons for the blooming research in thisfield: 1) the startling losses both in human lives and financecaused by vehicle accidents, 2) the availability of feasibletechnologies accumulated within the last 30 years ofcomputer vision research, and 3) the exponential growthin processor speeds have paved the way for running

694 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 5, MAY 2006

. Z. Sun is with eTreppid Technologies, LLC., 755 Trademark Drive, Reno,NV 89521. E-mail: [email protected].

. G. Bebis is with the Computer Vision Laboratory, Department of ComputerScience and Engineering, University of Nevada, 1664 North VirginiaStreet, Reno, NV 89557. E-mail: [email protected].

. R. Miller is with the Vehicle Design R & A Department, Ford MotorCompany, 1 American Road, Dearborn, MI 48126-2798.E-mail: [email protected].

Manuscript received 11 Aug. 2004; revised 15 Aug. 2005; accepted 22 Aug.2005; published online 13 Mar. 2006.Recommended for acceptance by H. Sawhney.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPAMI-0422-0804.

0162-8828/06/$20.00 � 2006 IEEE Published by the IEEE Computer Society

computation-intensive video-processing algorithms even ona low-end PC in realtime.

With the ultimate goal of building autonomous vehicles,many government institutions, automotive manufacturersand suppliers, and R&D companies have launched variousprojects worldwide, involving a large number of researchunits working cooperatively. These efforts have producedseveral prototypes and solutions, based on rather differentapproaches [5], [12], [13], [6]. Looking at research onintelligent vehicles worldwide, Europe pioneers the re-search, followed by Japan and United States.

In Europe, the PROMETHEUS project (Program forEuropean Traffic with Highest Efficiency and UnprecedentedSafety) started this exploration in 1986. More than 13 vehiclemanufactures and research institutes from 19 Europeancountries were involved. Several prototype vehicles andsystems were designed and demonstrated as a result ofPROMETHEUS. In 1987, the UBM (Universitaet der Bundes-wehr Munich) test vehicle VaMoRs demonstrated thecapability of fully autonomous longitudinal and lateralvehicle guidance by computer vision on a 20 km free sectionof highway at speed up to 96 km/h. Vision was used toprovide input for both lateral and longitudinal control. Thatwas considered as the first milestone.

Further development of this work has been in collabora-tion with von Seelen’s group [14] and Daimler-Benz VITAproject (VIsion Technology Application) [15]. Long rangeautonomous driving has been demonstrated by the VaMP ofUBM in 1995. The trip was from Munich to Odense, Denmark,more than 1,600 km. About 95 percent of the distance wasdriven without intervention of the safety driver [3]. Anotherexperimental vehicle, mobile laboratory (MOB-LAB), wasalso part of the PROMETHEUS project [16]. It was equippedwith four cameras, several computers, monitors, and acontrol-panel to give a visual feedback and warnings to thedriver. One of the most promising subsystems in the MOB-LAB was the Generic Obstacle and Lane Detection (GOLD)system. The GOLD system, utilizing a stereo rig in the MOB-LAB, addressed both lane and obstacle detection at the same

time. The lane detection was based on a pattern matchingtechnique, while the obstacle detection was reduced to thedetermination of the free-space in front of the vehicle usingthe stereo image pairs without 3D reconstruction. The GOLDsystem has been ported on ARGO, a Lancia Thema passengercar with automatic steering capabilities [17].

Although the first research efforts on developing intelli-gent vehicles were seen in Japan in the 1970s, significantresearch activities have been triggered by prototype vehiclesbuilt in Europe in the late-1980s and early-1990s. MITI,Nissan, and Fujitsu pioneered the research in this area byjoining forces in the project “Personal Vehicle System” [18], aproject with deep influence on Japan. In 1996, the AdvancedCruise-Assist Highway System Research Association (AHSRA)was established among automobile industries and a largenumber of research centers in Japan [5]. The JapaneseSmartway concept car will implement some driver aidfeatures, such as lane keeping, intersection collision avoid-ance, and pedestrian detection. A model deployment projectwas planned to be operational by 2003 and nationaldeployment in 2015 [6].

In the United States, a number of initiatives have beenlaunched to address this problem. In 1995, the US govern-ment established the National Automated Highway SystemConsortium (NAHSC) [19], and launched the Intelligent VehicleInitiative (IVI) in 1997. Several promising prototype vehicles/systems have been investigated and demonstrated within thelast 15 years [20]. The Navlab group at Carnegie MellonUniversity has a long history of development of automatedvehicles and intelligent systems for driver assistance. Thegroup has produced a series of 11 vehicles, Navlab 1 throughNavlab 11. Their applications have included off-road scout-ing, automated highways, run-off-road collision prevention,and driver assistance for maneuvering in crowded cityenvironments. In 1995, NavLab5 demonstrated long rangepartially autonomous driving (i.e., automatic lateral control)on highways from the east coast to the west. With a more than5,000 km trip, 98 percent of the distance was driven withoutintervention of the human safety driver [21]. The latest modelin the Navlab family is the Navlab 11, a robot Jeep Wranglerequipped with a wide variety of sensors for short-range andmidrange obstacle detection [22], [23], [20].

Major motor companies including Ford and GM havepoured great effort into this research and already demon-strated several promising concept vehicles. US governmentagencies are very supportive of intelligent vehicle research.Recently, the US Department of Transportation (USDOT) haslaunched a five year, 35 million dollar project with GM todevelop and test preproduction rear-end collision avoidancesystem [6]. In March 2004, the whole world was stimulated bythe “grand challenge” organized by The US Defense AdvancedResearch Projects Agency (DARPA) [24]. In this competition,15 fully autonomous vehicles attempted to independentlynavigate a 250-mile (400 km) desert course within a fixed timeperiod, all with no human intervention whatsoever—nodriver, no remote-control, just pure computer-processing andnavigation horsepower, competing for a 1 million cash prize.Although, even the best vehicle (i.e., “Red Team” fromCarnegie Mellon) made only seven miles, it was a very bigstep towards building autonomous vehicles in the future.

SUN ET AL.: ON-ROAD VEHICLE DETECTION: A REVIEW 695

Fig. 1. The variety of vehicle appearances poses a big challenge forvehicle detection.

3 ACTIVE VERSUS PASSIVE SENSORS

The most common approach to vehicle detection is usingactive sensors [25] such as radar-based (i.e., millimeter-wave)[26], laser-based (i.e., lidar) [27], [28], and acoustic-based [29].In radar, radio waves are transmitted into the atmosphere,which scatters some of the power back to the radar’s receiver.A Lidar (i.e., “Light Detection and Ranging”) also transmitsand receives electromagnetic radiation, but at a higherfrequency; it operates in the ultraviolet, visible, and infraredregion of the electromagnetic spectrum.

The reason that these sensors are called active is becausethey detect the distance of objects by measuring the traveltime of a signal emitted by the sensors and reflected by theobjects. Their main advantage is that they can measure certainquantities (e.g., distance) directly without requiring powerfulcomputing resources. Radar-based systems can “see” at least150 meters ahead in fog or rain, where average drivers can seethrough only 10 meters or less. Lidar is less expensive toproduce and easier to package than radar; however, with theexception of more recent systems, lidar does not perform aswell as radar in rain and snow. Laser-based systems are moreaccurate than radars, however, their applications are limitedby their relatively higher costs. Prototype vehicles employingactive sensors have shown promising results. However,when a large number of vehicles move simultaneously in thesame direction, interference among sensors of the same typeposes a big problem. Moreover, active sensors have, ingeneral, several drawbacks, such as low spatial resolutionand slow scanning speed. This is not the case with more recentlaser scanners, such as SICK [27], which can gather highspatial resolution data at high scanning speeds.

Optical sensors, such as normal cameras, are usuallyreferred to as passive sensors [25] because they acquire data ina nonintrusive way. One advantage of passive sensors overactive sensors is cost. With the introduction of inexpensivecameras, we could have both forward and rearward facingcameras on a vehicle, enabling a nearly 360 degree field ofview. Optical sensors can be used to track more effectivelycars entering a curve or moving from one side of the road toanother. Also, visual information can be very important in anumber of related applications, such as lane detection, trafficsign recognition, or object identification (e.g., pedestrians andobstacles), without requiring any modifications to roadinfrastructures. Several systems presented in [5] demonstratethe principal feasibility of vision-based driver assistancesystems.

4 THE TWO STEPS OF VEHICLE DETECTION

On-board vehicle detection systems have high computationalrequirements as they need to process the acquired images atreal-time or close to real-time to save time for driver reaction.Searching the whole image to locate potential vehiclelocations is prohibitive for real-time applications. Themajority of methods reported in the literature follow twobasic steps: 1) HG where the locations of possible vehicles inan image are hypothesized and 2) HV where tests areperformed to verify the presence of vehicles in an image(see Fig. 2). Although there is some overlap in the methodsemployed for each step, this taxonomy provides a goodframework for discussion throughout this survey.

5 HG METHODS

Various HG approaches have been proposed in the literature,which can be classified into one of the following threecategories: 1) knowledge-based, 2) stereo-based, and 3) motion-based. The objective of the HG step is to find candidate vehiclelocations in an image quickly for further exploration.Knowledge-based methods employ a priori knowledge tohypothesize vehicle locations in an image. Stereo-basedapproaches take advantage of the Inverse PerspectiveMapping (IPM) [30] to estimate the locations of vehicles andobstacles in images. Motion-based methods detect vehiclesand obstacles using optical flow. The hypothesized locationsfrom the HG step form the input to the HV step, where testsare performed to verify the correctness of the hypotheses.

5.1 Knowledge-Based Methods

Knowledge-based methods employ a priori knowledge tohypothesize vehicle locations in an image. We review belowsome representative approaches using information aboutsymmetry, color, shadow, geometrical features (e.g., cor-ners, horizontal/vertical edges), texture, and vehicle lights.

5.1.1 Symmetry

As one of the main signatures of man-made objects,symmetry has been used often for object detection andrecognition in computer vision [31]. Images of vehiclesobserved from rear or frontal views are in general symme-trical in the horizontal and vertical directions. This observa-tion has been used as a cue for vehicle detection in severalstudies [32], [33]. An important issue that arises whencomputing symmetry from intensity, however, is the pre-sence of homogeneous areas. In these areas, symmetryestimations are sensitive to noise. In [4], information aboutedges was included in the symmetry estimation to filter outhomogeneous areas (see Fig. 3). In a different study, Seelenet al. [34] formulated symmetry detection as an optimizationproblem which was solved using Neural Networks (NNs).

5.1.2 Color

Although few existing systems use color information to itsfull extent for HG, it is a very useful cue for obstacle detection,lane/road following, etc. Several prototype systems haveinvestigated the use of color information as a cue to followlanes/roads [35], or segment vehicles from background [36],[37]. Crisman et al. [35] used two closely positioned camerasto extend the dynamic range of a single camera. One camerawas set to capture the shadowed area by opening its iris, andthe other the sunny area by using a closed iris. Combiningcolor information (i.e., red, green, and blue) from the twoimages, he formed a six-dimensional color space. A Gaussian


Fig. 2. Illustration of the two-step vehicle detection strategy.

distribution was fit to this color space and each pixel wasclassified as either road or nonroad pixel.

Buluswar and Draper [36] used a nonparametric learning-based approach for object segmentation and recognition. Amultivariate decision tree was utilized to model the object inthe RGB color space from a number of training examples.Among various color spaces, the RGB color space ensures thatthere is no distortion in the initial color information, however,color features are highly correlated—it is difficult to evaluatethe difference of two colors from their distance in RGB colorspace. In [37], Guo et al. chose the L*a*b color space instead.The L*a*b color space has the property that it maps equallydistinct color differences into equal Euclidean distances. Anincremental region fitting method was investigated in theL*a*b color space for road segmentation [37].

5.1.3 Shadow

Using shadow information as a sign pattern for vehicledetection was initially discussed in [38]. By investigatingimage intensity, it was found that the area underneath avehicle is distinctly darker than any other areas onan asphalt paved road. A first attempt to deploy thisobservation can be found in [39], although there was nosystematic way to choose appropriate threshold values. Theintensity of the shadow depends on the illumination of theimage, which in turn depends on weather conditions.Therefore, the thresholds cannot be, by any means, fixed. Tosegment the shadow area, a low and a high threshold arerequired. However, it is obvious that it is hard to find a lowthreshold for a shadow area. The high threshold can beestimated by analyzing the gray level of the “free drivingspace”—the road right in front of the prototype vehicle.

Tzomakas and Seelen [40] followed the same idea andproposed a method to determine the threshold values.Specifically, a normal distribution was assumed for the

intensity of the free driving space. The mean and varianceof the distribution were estimated using Maximum Like-lihood (ML) estimation. The high threshold of the shadowarea was defined as the limit where the distribution of theroad gray values declined to zero on the left of the mean,which was approximated by m� 3�, where m is the meanand � is the standard deviation. This algorithm is depictedin Fig. 4. It should be noted that the assumption about thedistribution of the road pixels might not always hold true.

5.1.4 Corners

Exploiting the fact that vehicles in general have a rectangularshape with four corners (upper-left, upper-right, lower-left,and lower-right), Bertozzi et al. proposed a corner-basedmethod to hypothesize vehicle locations [41]. Four templates,each of them corresponding to one of the four corners, wereused to detect all the corners in an image, followed by a searchmethod to find the matching corners (i.e., a valid upper-leftcorner should have a matched lower-right corner).

5.1.5 Vertical/Horizontal Edges

Different views of a vehicle, especially rear/frontal views,contain many horizontal and vertical structures, such as rear-window, bumper, etc. Using constellations of vertical andhorizontal edges has shown to be a strong cue for hypothesiz-ing vehicle presence. In an effort to find pronounced verticalstructures in an image, Matthews et al. [42] used edgedetection to find strong vertical edges. To localize left andright position of a vehicle, they computed the vertical profileof the edge image (i.e., by summing the pixels in each column)followed by smoothing using a triangular filter. By findingthe local maximum peaks of the vertical profile, they claimedthat they could find the left and right position of a vehicle. Ashadow method, similar to that in [40], was used to find thebottom of the vehicle. Because there were no consistent cuesassociated with the top of a vehicle, they detected it byassuming that the aspect ratio of any vehicle was one.

Goerick et al. [43] proposed a method called LocalOrientation Coding (LOC) to extract edge information. Animage obtained by this method consists of strings of binarycode representing the directional gray-level variation in thepixel’s neighborhood. These codes carry essentially edgeinformation. Handmann et al. [44] also used LOC, togetherwith shadow information, for vehicle detection. Parodi andPiccioli [45] proposed to extract the general structure of atraffic scene by first segmenting an image into four regions:pavement, sky, and two lateral regions using edge grouping.


Fig. 3. Computing the symmetry: (a) gray-level symmetry, (b) edge

symmetry, (c) horizontal edges symmetry, (d) vertical edges symmetry,

and (e) total symmetry (from [70]).

Fig. 4. Free driving spaces, the corresponding gray-value histograms

and the thresholded images (from [40]).

Groups of horizontal edges on the detected pavement werethen considered for hypothesizing the presence of vehicles.

Betke et al. [46] utilized edge information to detect distantcars. They proposed a coarse-to-fine search method lookingfor rectangular objects. The coarse search checked the wholeimage to see if a refined search was necessary, and a refinedsearch was activated only for small regions of the image,suggested by the coarse search. The coarse search lookedthrough the whole edge maps for prominent edges, such aslong uninterrupted edges. Whenever such edges were found,the refined search process was started in that region.

In [47], vertical and horizontal edges were extractedseparately using the Sobel operator. Then, two edge-basedconstraint filters (i.e., rank filter and attached line edgefilter) were applied on those edges to segment vehicles frombackground. The edge-based constraint filters were derivedfrom prior knowledge about vehicles. Assuming that laneshave been successfully detected, Bucher et al. [48] hypothe-sized vehicle presence by scanning each lane starting fromthe bottom to a certain vertical position, corresponding to apredefined maximum distance in the real world. Potentialcandidates were obtained if a strong horizontal segmentdelimited by the lane borders had been found. A multiscaleapproach which combines subsampling with smoothing tohypothesize possible vehicle locations more robustly wasproposed in [49] to address the above problems.

Three levels of detail were used: ð360� 248Þ, ð180� 124Þ,and ð90� 62Þ. At each level, the image was processed byapplying the following steps:

1. low pass filtering (e.g., first column of Fig. 5);2. vertical edge detection (e.g., second column of Fig. 5),

vertical profile computation of the edge image (e.g.,last column of Fig. 5), and profile filtering using a lowpass filter;

3. horizontal edge detection (e.g., third column of Fig. 5),horizontal profile computation of the edge image (e.g.,last column of Fig. 5), and profile filtering using a lowpass filter; and

4. local maxima and minima detection (e.g., peaks andvalleys) of the two profiles.

The peaks and valleys of the profiles provide stronginformation about the presence of a vehicle in the image.Starting from the coarsest level of detail, all the local maximaat that level are found first. Although the resulted lowresolution images have lost fine details, important verticaland horizontal structures are mostly preserved (e.g., first rowof Fig. 5). Once the maxima at the coarsest level have beenfound, they are traced down to the next finer level. The resultsfrom this level are finally traced down to the finest levelwhere the final hypotheses are generated.

The proposed multiscale approach improves systemrobustness by making the hypothesis generation step lesssensitive to the choice of parameters. Forming the firsthypotheses at the lowest level of detail is very useful since thislevel contains only the most salient structural features.Besides improving robustness, the multiscale schemespeeds-up the whole process since the low resolution imageshave much simpler structure as illustrated in Fig. 5 (i.e.,candidate vehicle locations can be found faster and easier).Several examples are provided in Fig. 5 (left column).

5.1.6 Texture

The presence of vehicles in an image causes local intensitychanges. Due to general similarities among all vehicles, theintensity changes follow a certain texture pattern [50]. Thistexture information can be used as a cue to narrow down thesearch area for vehicle detection. Entropy was first used as ameasure for texture detection. For each image pixel, a smallwindow was chosen around it, and the entropy of thatwindow was considered as the entropy of the pixel. Onlyregions with high entropy were considered for furtherprocessing.

Another texture-based segmentation method suggested in[50] uses co-occurrence matrices introduced in [51]. The co-occurrence matrix contains estimates of the probabilities ofco-occurrences of pixel pairs under predefined geometricaland intensity constraints. Fourteen statistical features werecomputed from the co-occurrence matrices [51]. For typicaltextures of geometrical structures, like trucks and cars, fourmeasurements out of the 14 were found to be critical for objectdetection (i.e., energy, contrast, entropy, and correlation) [50].Using co-occurrence matrices for texture detection is moreaccurate in general than using the entropy method mentionedearlier since co-occurrence matrices employ second orderstatistics as opposed to histogram information employed bythe entropy method (see Fig. 6). However, computing the co-occurrence matrices is expensive.


Fig. 5. Multiscale hypothesis generation—size of the images: 90� 62(first row), 180� 124 (second row), and 360� 248 (third row). Theimages in the first column have been obtained by applying low passfiltering at different scales; second column: vertical edge maps; thirdcolumn: horizontal edge maps; fourth column: vertical and horizontalprofiles. All images have been scaled back to 360� 248 for illustrationpurposes (from [49]).

Fig. 6. Image, free driving space and image segmentation-basedlocal image entropy and co-occurrence-based image segmentation(from [50]).

5.1.7 Vehicle Lights

Most of the cues discussed above are not helpful fornighttime vehicle detection—it would be difficult orimpossible to detect shadows, horizontal/vertical edges,or corners in images obtained at night conditions. A salientvisual feature during night time is the vehicle lights.Cucchiara and Piccardi [52] have used morphologicalanalysis to detect vehicle light pairs in a narrow inspectionarea. The morphological operator also considered theshape, size, and minimal distance between vehicles toprovide hypotheses.

5.2 Stereo-Vision-Based Methods

There are two types of methods that use the stereoinformation for vehicle detection. One uses disparity map,while the other uses an antiperspective transformation—Inverse Perspective Mapping (IPM). We assume that cameraparameters have already been computed through calibration.

5.2.1 Disparity Map

The difference in the left and right images between corre-sponding pixels is called disparity. The disparities of all theimage points form the disparity-map. If the parameters of thestereo rig are known, the disparity map can be converted intoa 3D map of the viewed scene. Computing the disparity map isvery time consuming due to the requirement of solving thecorrespondence problem for every pixel; however, it ispossible to do it in real-time using a Pentium class processoror embedded hardware [53]. Once the disparity map isavailable, all the pixels within a depth of interest according toa disparity interval are determined and accumulated in adisparity histogram. If an obstacle is present within the depthof interest, then a peak will occur at the correspondinghistogram bin (i.e., similar idea to the Hough transform).

In [54], it was argued that, to solve the correspondenceproblem, area-based approaches were too computationallyexpensive, and disparity maps from feature-based methodswere not dense enough. A local feature extractor (i.e.,“structure classification”) was proposed to solve the corre-spondence problem faster. According to this approach, eachpixel was classified into various categories (e.g., vertical edgepixels, horizontal edge pixels, corner edge pixels, etc.) basedon the intensity differences between the pixel and its fourdirect neighbors. To simplify finding pixel correspondences,the optical axes of the stereo-rig were aligned in parallel (i.e.,corresponding points were on the same row in each image).Accordingly, their search for corresponding pixels wasreduced to a simple test (i.e., whether two pixels belong tothe same category or not). Obviously, there are cases wherethis approach does not yield unique correspondences. Toaddress this problem, they furtherclassified the pixels bytheirassociated disparities into several bins by constructing adisparity histogram. The number of significant peaks in thehistogram indicated how many possible objects were presentin the images.

5.2.2 Inverse Perspective Mapping

The term “Inverse Perspective Mapping” does not corre-spond to an actual inversion of perspective mapping [30],which is mathematically impossible. Rather, it denotes aninversion under the additional constraint that inverselymapped points lie on the horizontal plane. If we consider apoint p in the 3D space, perspective mapping implies a line

passing through this point and the center of projection N ,see Fig. 7. To find the image of the point, we intersect theline with the image plane. IPM is defined by the followingprocedure: For a point pI

0 in the image, we trace theassociated ray through N towards the horizontal plane. Theintersection of the ray with the horizontal plane is the resultof the inverse perspective mapping applied to the imagepoint pI

0. If we compose both perspective and inverseperspective, the horizontal plane is mapped onto itself,while elevated parts of the scene appear distorted.

Assuming a flat road, Zhao and Yuta [55] used stereovision to predict the image seen from the right camera, giventhe left image, using IPM. Specifically, they used IPM totransform every point in the left image to world coordinates,and reprojected them back onto the right image, which werethen compared against the actual right image. In this way,they were able to find contours of objects above the groundplane. Instead of warping the right image onto the left image,Bertozzi and Broggi [16], [56] computed the IPM of both theright and left images. Then, they took the difference betweenthe two remapped left and right images. Due to the flat-roadassumption, anything elevating out from the road wasdetected by looking for large clusters of nonzero pixels inthe difference image. In the ideal case, the difference imagecontains two triangles for each obstacle that correspond to theleft and right boundaries of the obstacle (see Fig. 8e). This isbecause, except for those pixels on the left and rightboundaries of the obstacle, all other pixels are the same inthe left and right remapped images. Locating those triangles,however, was very difficult due to texture, irregular shape,and nonhomogeneous brightness of obstacles. To deal withthese issues, they used a polar histogram to detect thetriangles. Given a point on the road plane, the polar histogramwas computed by scanning the difference image andcounting the number of over threshold pixels for everystraight line originating from that point. Knoeppel et al. [57]clustered the elevated 3D points based on their distance fromthe ground plane to generate hypotheses. Each hypothesiswas tracked over time and further verified using Kalmanfilters. This system assumed that the dynamic behavior of thehost vehicle was known, and the path information was storedin a dynamic map. The system was able to detect vehicles upto 150 m under normal daytime weather conditions.


Fig. 7. Geometry of perspective mapping.

Although only two cameras are required to find the rangeand elevated pixels in an image, there are several advantagesto use more than two cameras [58]: 1) repeating texture canconfuse a two cameras system by causing matching ambi-guities, which can be eliminated when additional cameras arepresent and 2) shorter baseline systems are less prone tomatching errors while longer baseline systems are moreaccurate. The combination is better than either one alone.Williamson and Thorpe [59] investigated a trinocular system.The trinocular rig was mounted on top of a vehicle with thelongest baseline being 1.2 meters. The third camera wasdisplaced 50 cm horizontally and 30 cm vertically to provide ashort baseline. The system reported a capacity of detectingobjects as small as 14 cm at range in excess of 100 m. Due to theadditional computational costs, however, binocular system ismore preferred in the driver assistance system.

5.3 Motion-Based Methods

All the cues discussed so far use spatial features todistinguish between vehicles and background. Anothercue that can be employed is relative motion obtained via thecalculation of optical flow. Let us represent image intensityat location ðx; yÞ at time t by Eðx; y; tÞ. Pixels on the imagesappear to be moving due to the relative motion between thesensor and the scene. The vector field ~ooðx; yÞ of this motionis referred to as optical flow.

Optical flow can provide strong information for HG.Approaching vehicles at an opposite direction produce adiverging flow, which can be quantitatively distinguishedfrom the flow caused by the car ego-motion [60]. On the otherhand, departing or overtaking vehicles produce a convergingflow. To take advantage of these observations in obstacledetection, the image is first subdivided into small subimagesand an average speed is estimated in every subimage.Subimages with a large speed difference from the globalspeed estimation are labeled as possible obstacles.

The performance of several methods for recovering opticalflow~ooðx; yÞ from the intensity Eðx; y; tÞ have been comparedin [61] using some selected image sequences from (mostlyfixed) cameras (see Fig. 9). Most of these methods computetemporal and spatial derivatives of the intensity profiles and,therefore, are referred to as differential techniques. Getting areliable dense optical flow estimate under a moving-camerascenario is not an easy task. Giachetti et al. [60] developedsome of the best first-order and second-order differentialmethods in the literature and applied them to a typical imagesequence taken from a moving vehicle along a flat andstraight road. In particular, they managed to remap thecorresponding points between two consecutive frames, byminimizing the following distance measure:


Fig. 8. Obstacle detection: (a) left and (b) right stereo images, (c) and

(d) the remapped images, (e) the difference image, and (f) corresponding

polar histogram (from [56]).

Fig. 9. Comparison of optical flows computed with different algorithms:(a) a frame of image sequence, (b) the theoretical optical flow expectedfrom a pure translation over a flat surface, (c) optical flow from first-orderderivative method, (d) optical flow using second-order derivatives theremapped images, (e) optical flow using multiscale differential techni-que, and (f) optical flow computed with correlation technique (from [60]).

Xni¼�n

Xnj¼�n

hEðiþ x0; jþ y0; t0Þ �Eðiþ x; jþ y; tÞ

i2

; ð1Þ

where ðx0; y0Þ and ðx; yÞ are two corresponding points attime t0 and t. The size of the search window was n� n.Since adjusting the corresponding pairs for each of thepoints was quite expensive, they employed a less dense gridto reduce computational cost.

Kruger et al. [62] estimated optical flow from spatio-temporal derivatives of the gray value images using a localapproach. They further clustered the estimated optical flow toeliminate outliers. Assuming a calibrated camera and knownego-motion, they detected both moving and stationaryobjects. Generating a displacement vector for each pixel (i.e.,dense optical flow) is time consuming and also impractical fora real-time system. In contrast to dense optical flow, “sparseoptical flow” is less time consuming by utilizing imagefeatures, such as corners [63], [64], local minima and maxima[65], or “Color Blobs” [66]. Although it can only produce asparse flow, feature based methods can provide sufficientinformation for HG. Moreover, in contrast to pixel-basedoptical flow estimation methods where pixels are processedindependently, feature-based methods utilize high-levelinformation. Consequently, they are less sensitive to noise.

6 HV METHODS

The input to the HV step is the set of hypothesized locationsfrom the HG step. During HV, tests are performed to verifythe correctness of a hypothesis. Approaches to HV can beclassified mainly into two categories: 1) template-based and2) appearance-based. Template-based methods use predefinedpatterns from the vehicle class and perform correlation.Appearance-based methods, on the other hand, learn thecharacteristics of the vehicle class from a set of trainingimages which should capture the variability in vehicleappearance. Usually, the variability of the nonvehicle classis also modeled to improve the performance. Each trainingimage is represented by a set of local or global features. Then,the decision boundary between the vehicle and nonvehicleclasses is learned either by training a classifier (e.g., NNs,Support Vector Machines (SVMs)) or by modeling theprobability distribution of the features in each class (e.g.,using the Bayes rule assuming a Gaussian distribution).

6.1 Template-Based Methods

Template-based methods use predefined patterns of thevehicle class and perform correlation between the imageand the template. Some of the templates reported in theliterature represent the vehicle class “loosely,” while othersare more detailed. It should be mentioned that, due to thenature of the template matching methods, most papers inthe literature do not report quantitative results anddemonstrate performance through examples.

Parodi and Piccioli [45] proposed a hypothesis verificationscheme based on the presence of license plates and rearwindows. This can be considered as a loose template of thevehicle class. No quantitative performance was include in thepaper. Handmann et al. [44] proposed a template based on theobservation that the rear/frontal view of a vehicle has a“U” shape (i.e., one horizontal edge, two vertical edges, andtwo corners connecting the horizontal and vertical edges).

During verification, they considered a vehicle to be present inthe image if they could find the “U” shape.

Ito et al. [67] used a very loose template to recognizevehicles. Using active sensors for HG, they checked whetheror not pronounced vertical/horizontal edges and symmetryexisted. Due to the simplicity of the template, they did notexpect very accurate results, which was the main reason foremploying active sensors for HG. Regensburger et al. [68]utilized a template similar to [67]. They argued that thevisual appearance of an object depends on its distance fromthe camera. Consequently, they used two slightly differentgeneric object (vehicle) models, one for nearby objects andanother for distant objects. This method, however, raises thequestion of what model to use in a specific location. Insteadof working with different generic models, distance-depen-dent subsampling was performed before the verificationstep in [69].

A template, called “moving edge closure,” was used in[52] which was fit to groups of moving points. To get themoving edge closure, they performed edge detection on thearea covered by the detected moving points, followed bythe external edge connection. If the size of the moving edgeclosure was within a predefined range, they claimed vehicledetected. Nighttime vehicle detection was also addressed inthis work [52]. Basically, pairs of headlights were consid-ered as templates for vehicle detection.

A rather loose template was also used in [70], wherehypotheses were generated on the basis of road position andperspective constraints. The template contained a prioriknowledge about vehicles: “A vehicle is generally symmetric,characterized by a rectangular bounding box which satisfiesspecific aspect ratio constraints.” The model matchingworked as follows: Initially, the hypothesized region waschecked for the presence of two corners representing thebottom of the bounding box, similar to the “U” shape idea in[44]. The presence of corners was validated using perspectiveand size constraints. Then they detected the top part of thebounding box in a specific region determined, once again, byperspective and size constraints. Once the bounding box wasdetected successfully, they claimed vehicle presence in thatregion. This template could be very fast, however, itintroduces some uncertainties, given that there might beother objects on the road satisfying those constraints (e.g.,distant buildings).

6.2 Appearance Methods

HV using appearance models is treated as a two-classpattern classification problem: vehicle versus nonvehicle.Building a robust pattern classification system involvessearching for an optimum decision boundary between theclasses to be categorized. Given the huge within-classvariabilities of the vehicle class, we can imagine that this isnot an easy task. One feasible approach is to learn thedecision boundary based on training a classifier using thefeature sets extracted from a training set.

Appearance-based methods learn the characteristics ofvehicle appearance from a set of training images whichcapture the variability in the vehicle class. Usually, thevariability of the nonvehicle class is also modeled toimprove performance. First, a large number of trainingimages is collected and each training image is representedby a set of local or global features. Then, the decisionboundary between the vehicle and nonvehicle classes is


learned either by training a classifier or by modeling theprobability distribution of the features in each class.

Various feature extraction methods have been investi-gated in the context of vehicle detection. Based on the methodused, the features extracted can be classified as either local orglobal. Global features are obtained by considering all thepixels in an image. Wu and Zhang [71] used standardPrincipal Components Analysis (PCA) for feature extraction,together with a nearest-neighbor classifier, reporting an 89percent accuracy. However, their training database was quitesmall (93 vehicle images and 134 nonvehicle images), whichmakes it difficult to draw any useful conclusions.

An inherent problem with global feature extractionapproaches is that they are sensitive to local or globalimage variations (e.g., pose changes, illumination changes,and partial occlusion). Local feature extraction methods onthe other hand are less sensitive to these effects. Moreover,geometric information and constraints in the configurationof different local features can be utilized either explicitlyor implicitly.

Different from [71], in [42], PCA was used for featureextraction and Neural Networks (NNs) for classification.First, eachsubimagecontaining vehiclecandidates wasscaledto 20� 20, then it was subdivided into 25 4� 4 subwindows.PCA was applied on every subwindow (i.e., “local PCA”) andthe output was provided to a NN to verify the hypothesis.

Goerick et al. [43] andNoli et al. [72] usedthe (LOC) method(see Section 5.1.5) to extract edge information. The histogramof LOC within the area of interest was then provided to aNN classifier, a Bayes classifier and combination of both forclassification. For NN, the number of nodes in the first layerwas between 350-450 while the number of hidden nodes was10-40. They used 2,000 examples for training and the wholesystem ran in real-time. The performance of the neural netclassifier was 94.7 percent, which is slightly better than theirBayes classifier (94.4 percent), also very close to the combinedclassifier (95.7 percent).

Kalinke et al. [50] designed two models for vehicledetection: one for sedans and the other for trucks. Twodifferent model generation methods were used. The first onewas designed manually, while the second one was based on astatistical algorithm using about 50 typical trucks and sedans.Classification was performed using NNs. The input to theNNs was the Hausdorrf distances between the hypothesizedvehicles and the models, both represented in terms of theLOC. The NN classified every input into three classes: sedans,trucks, or background. Similar to [43], Handmann et al. [44]utilized the histogram of LOC, together with a NN, for vehicledetection. The Hausdorrf distance was used for the classifica-tion of trucks and cars such as in [50]. No quantitativeperformance was reported in [44] or [50].

A statistical model of vehicle appearance was investigatedby Schneiderman and Kanade [73]. A view-based approachemploying multiple detectors was used to cope with view-point variations. The statistics of both object and “nonobject”appearance were represented using the product of twohistograms with each histogram representing the jointstatistics of a subset of Haar wavelet features and theirposition on the object. A three-level wavelet transform wasused to capture the space, frequency, and orientationinformation. This three-level decomposition produced10 subbands and 17 subsets of quantized wavelet coefficientswere used. Bootstrapping was used to gather the statistics ofthe nonvehicle class. The best performance reported in [73]

was 92 percent. A different statistical model was investigatedby Weber et al. [74]. They represented each vehicle image as aconstellation of local features and used the Expectation-Maximization (EM) algorithm to learn the parameters of theprobability distribution of the constellations. They used200 images for training and reported an 87 percent accuracy.

An overcomplete dictionary of Haar wavelet features wasutilized in [75] for vehicle detection. They argued that thisrepresentation provided a richer model and spatial resolutionand that it was more suitable for capturing complex patterns.The overcomplete Haar wavelet features were derived from aset of redundant functions, where the wavelets at level nwas1=4� 2n instead of 2n. They referred it to as quadruple densitydictionary. A total of 1,032 positive training patterns and5,166 negative training patterns were used for training andthe ROC showed that the false positive rate was close to1 percent when the detection rate approached to 100 percent.

Sun et al. [76], [49] went one step further by arguing thatthe actual values of the wavelet coefficients are not veryimportant for vehicle detection. In fact, coefficient magni-tudes indicate local oriented intensity differences, informa-tion that could be very different even for the same vehicleunder different lighting conditions. Following this observa-tion, they proposed using quantized coefficients to improvedetection performance. The quantized wavelet featuresyielded a detection rate of 93.94 percent compared to91.49 percent using the original wavelet features.

Using Gabor filters for vehicle feature extraction wasinvestigated in [77]. Gabor filters provide a mechanism forobtaining orientation and scale tunable edge and linedetectors. Vehicles contain strong edges and lines atdifferent orientation and scales; thus, this type of featuresare very effective for vehicle detection. The hypothesizedvehicle subimages were subdivided into nine overlappingsubwindows. Gabor filters were then applied on eachsubwindow separately. The magnitudes of the responses ofthe Gabor filters were collected from each subwindow andrepresented by three moments: the mean �, the standarddeviation �, and the skewness �. Classification wasperformed using Support Vector Machines (SVMs) yield-ing an accuracy of 94.81 percent.

A “vocabulary” of information-rich vehicle parts wasconstructed automatically by applying the Forstner interestoperator onto a set of representative images, together with aclustering method in [78]. Each image was represented interms of parts from this vocabulary to form a feature vector,which was used to train a classifier to verify hypotheses.Some successful detections were reported under highdegree of clutter and occlusion, and an overall 90.5 percentaccuracy was achieved. Following the same idea (i.e.,detection using components), Leung [79] investigated adifferent vehicle detection method. Instead of using theForstner interest operator, differences of Gaussians wereapplied onto images in scale space, and maxima andminima were selected as the key-points. At each of the key-points, the Scale Invariant Feature Transform (SIFT) [80]was utilized to form a feature vector, which was used totrain a SVM Classifier. Leung tested his algorithm on theUIUC data [78], showing slightly better performance.

7 INTEGRATING DETECTION WITH TRACKING

Vehicledetectioncanbeimprovedconsiderably,bothintermsof accuracy and time, by taking advantage of the temporalcontinuity present in the data. This can be achieved by


employing a tracking mechanism to hypothesize the locationof vehicles in future frames. Tracking takes advantage of thefact that it is very unlikely for a vehicle to show up only in oneframe. Therefore, vehicle location can be hypothesized usingpast history and a prediction mechanism. When trackingperformance drops, common hypothesis generation techni-ques can be deployed to maintain performance levels.

By examining the reported vehicle detection and trackingalgorithms/systems at the structural level, many similaritiescan be found. Specifically, the majority of existing on-roadvehicle detection and tracking systems use a detect-then-trackapproach (i.e., vehicles are first detected and then turned overto the tracker). This approach aims to resolve detection andtracking sequentially and separately. There are many exam-ples in the literature following this strategy. In Ferryman et al.[81], vehicle detection is based on template matching [82]while tracking uses dynamic filtering. In that work, highorder statistics were used for detection and a Euclidean-distance-based correlation was employed for tracking. In[47], vehicles were tracked using multiple cues such asintensity and edge data. To increase sensor range for vehicletracking, Clady et al. employed an additional P/T/Z camera[83]. In [84], close to real time performance was reported (i.e.,14 frames per second) by integrating detection with trackingbased on deformable models. This approach has severaldrawbacks. First, false detections will be passed to the trackerwithout a chance of rectification. Second, tracking templatesfrom imperfect detections will jeopardize the reliability oftrackers. Most importantly, this type of approaches do notexploit temporal information in detection.

There exist several exceptions, where temporal informa-tion has been incorporated into detection. Betke et al. [85],[46] have realized that reliable detection from one or twoimages is very difficult and it only works robustly undercooperative conditions. Therefore, they used a refinedsearch within the tracking window to re-enforce thedetections (i.e., a car template was created online every10th frame and was correlated with the object in thetracking windows). Similar to [46], temporal tracking wasused to suppress false detections in [86], where only twosuccessive frames were employed. Similar observationswere made by Hoffman et al. [87] (i.e., detection quality wasimproved by accumulating feature information over time).

Temporal information has not been fully exploited yet inthe literature. Several efforts have been reported in [88] andmore recently in [89]. We envision a different strategy (i.e.,detect-and-track), where detection and tracking are addressedsimultaneously in a unified framework (i.e., detection resultstrigger tracking, and tracking reenforces detection byaccumulating temporal information through some probabil-istic models). Approaches following this framework wouldhave better chances to filter out false detections in subsequentframes. In addition, tracking template updates would beachieved through repeated detection verifications.

8 DISCUSSION

On-road vehicle detection is so challenging, that none of themethods reviewed can solve it alone completely. Differentmethods need to be undertaken and selected based on theprevailed conditions faced by the system [44], [90].Complementary sensors and algorithms should be used toimprove overall robustness and reliability. In general,

surrounding vehicles can be classified into three categoriesaccording to their relative position to the host vehicle:1) overtaking vehicles, 2) midrange/distant vehicles, and3) close-by vehicles (see Fig. 10). In close-by regions (A1),we may only see part of the vehicle. In this case, there is nofree space in the captured images, which makes theshadow/symmetry/edge-based methods inappropriate. Inthe overtaking regions (A2), only the side view of thevehicle is visible while appearance changes fast. Methodsdetecting vehicles in these regions might be better toemploy motion information or dramatic intensity changes[85], [46]. Detecting vehicles in the midrange/distant region(A3) is relatively easier since the full view of a vehicle isavailable and appearance is more stable.

Next, we provide a critique of the HG and HV methodsreviewed in the previous sections. Our purpose is toemphasize their main strengths and weaknesses as well asto present potential solutions reported in the literature forenhancing their performance for deployment in real settings.The emphasis is on making these methods more reliable androbust to deal with the challenging conditions encountered intraffic scenes. Additional issues are discussed in Section 9.

8.1 Critique of Knowledge-Based HG Methods

Systems employing local symmetry, corners, or textureinformation for HG are most effective in relatively simpleenvironments with no or little clutter. Employing these cuesin complex environments (e.g., when driving downtownwhere the background contains many buildings anddifferent textures), would introduce many false positives.In the case of symmetry, it is also imperative to have arough estimate of the vehicle’s location in the image for fastand accurate symmetry computations. Even when utilizingboth intensity and edge information, symmetry is quiteprone to false detections, such as symmetrical backgroundobjects, or partly occluded vehicles.

Color information has not been deployed extensively forHG due to the inherent difficulties of color-based objectdetection in outdoor settings. In general, the color of anobject depends on illumination, reflectance properties of theobject, viewing geometry, and sensor parameters. Conse-quently, the apparent color of an object can be quitedifferent during different times of the day, under differentweather conditions, and under different poses.

Employing shadow information and vehicle lights for HGhave been exploited in a limited number of studies. Underperfect weather conditions, HG using shadow informationcan be very successful. However, bad weather conditions(i.e., rain, snow, etc.) or bad illumination conditions makeroad pixels quite dark, causing this method to fail. Vehicle


Fig. 10. Detecting vehicles in different regions requires differentmethods. A1: Close by regions. A2: Overtaking regions. A3: Midrange/distant regions.

lights is a ubiquitous vehicle feature at night; however, itcould be confused with traffic lights and background lights.We believe that these cues have limited employability.

Utilizing horizontal and vertical edges for HG isprobably the most promising, knowledge-based, approachreported in the literature. Our experience with using edgeinformation in realistic experiments has been very positive[49]. Although this method can suffer from false positives,many of them can be rejected quickly using simple tests(e.g., aspect ratio). From a practical point of view, there arefast implementations of edge detection in hardware makingthis approach even more attractive. The main problem withthis approach is that it depends on a number of parametersthat could affect system performance and robustness. Forexample, we need to decide the thresholds for the edgedetection step, the thresholds for choosing the mostimportant vertical and horizontal edges, and the thresholdsfor choosing the best maxima (i.e., peaks) in the profileimages. A set of parameter values might work well undercertain conditions, however, they might fail in othersituations. We have described in Section 5 a multiresolutionscheme addressing these issues.

8.2 Critique of Stereo-Based HG Methods

Stereo-based methods have been employed extensively forHG, however, traditional implementations are time con-suming and work well only if the camera parameters havebeen estimated accurately. As a result, their performance issignificantly impaired. Using stereo vision to hypothesizevehicle location, dense disparity maps are necessary toguarantee that all regions are searched for potentialvehicles. Naive approaches to stereo computation are notsuitable for dynamic object detection at reasonable vehiclespeed due to the high complexity (i.e., Oðdm2n2Þ, where d isthe number of shifts over which the correspondence searchis performed, m is the size of the support window, and n isthe size of the images). There have been several approachesto overcome this problem, such as, computing sparsedisparity maps [90], [13], employing multiresolutionschemes [90], [13], or using prior knowledge about theenvironment to limit the search for correspondences [53].

Estimating the stereo parameters accurately is also hardto guarantee in an on-road scenario. Since the stereo rig is ona moving vehicle, vibrations from the vehicle’s motion andwindy conditions might shift the cameras, while the heightof the cameras keeps changing due to the vehicle’ssuspension. Suwa et al. [91] proposed a method to updatethe parameters and compensate for errors caused by cameramovements. The 3D measurements of a stereo-based systemare calculated using:

Mw ¼ RcMc þ tc; ð2Þ

where Mw and Mc represent vectors in the world coordinateand camera coordinate systems,Rc is a rotation matrix, and tcis a translation vector. A two-parameter sway model wasused in [91]: the sway direction angle and the sway range.Incorporating the effect of sway parameters leads to amodified model:

Mw ¼ R�R�ðR��R�Mc þ tcÞ þ t�; ð3Þ

where� ¼ �2 sin�1ð d2HÞ, d is the sway range,H is the height ofthe camera, � denotes the sway direction angle, and � the setup tilt angle. The two sway parameters were estimated from

corresponding pairs with sway and without sway. The imagedata without sway was assumed to have been obtained fromthe no sway image while the sway data was obtained by usingcorrelation. Estimation was done using least squares.

Bertozzi et al. [92] have also analyzed the parameter driftsand argued that vibrations affect, mostly, extrinsic para-meters, and not the intrinsic parameters. A fast self-calibra-tion method was considered to deal with this issue. Eightcarefully designed markers were put on the hood, four foreach of the two cameras. Since the world coordinates of themarkers were known, the determination of their imagecoordinates was sufficient to compute the position andorientation of the cameras in the same reference system.

8.3 Critique of Motion-Based HG Methods

In general, motion-based methods can detect objects basedon relative motion information. Obviously, this is a majorlimitation, for example, this method cannot be used todetect static obstacles, which can represent a big threat.Despite this fact, employing motion information for HG hasshown promising results; however, it is computationallyintensive while its performance is affected by severalfactors. Generating a displacement vector for each pixel(continuous approach) is time-consuming and impracticalfor a real-time system. In contrast, discrete methods basedon image features such as color blobs [66] or local intensityminima and maxima [65] has shown good performancewhile being faster. There have been also attempts to speedup motion-based computations using multiresolutionschemes [90]. Several factors affect the computation ofmotion information [60] including:

. Displacements between consecutive frames. Fast move-ment of the host vehicles causes significant pixeldisplacements. Points in the image can move bymore than five pixels, when the car moves at a speedfaster than 30 km/h. Consequently, aliasing in thecomputation of the temporal derivatives introduceserrors into the computation of optical flow.

. Lack of textures. Large portions in the imagesrepresent the road bed, where gray-level variationsare quite small, especially when driving the vehiclein a country road. Significant instability can beintroduced to the computation of the spatial deriva-tives due to texture insufficiency.

. Shocks and vibrations. Image motion is the sum of asmooth component due to the car ego-motion and ahigh frequency component due to the camera shocksand vibrations. In the presence of shocks andvibrations, caused by mechanical instability of thecamera, a high frequency noise is introduced to theintensity profile. This noise gets greatly amplifiedduring the computation of the temporal derivatives.In general, error introduced by shocks and vibra-tions is small if the camera is mounted on highquality antivibrating platforms and the vehicle ismoving along usual roads. However, if the camera ismounted less carefully or the vehicle is driven on abumpy road, the error can be 10 times larger.

Among these factors, camera movement is the main reasonthat traditional differential methods fail. If we can counter-balance camera movements, then these methods couldbecome very useful. This is the objective of another research


direction, called “image stabilization” [93], [94]. Imagestabilization is based on frame-to-frame registration. Takingthe first frame of the image sequence as a reference, thestabilization method registers this frame to the next frame,computes the motion parameters from the current frame tothe reference frame. Then, it uses the estimated parameters towarp the current frame to get the stabilized image, which canbe considered as taken from a stationary camera. The motionmodel employed in [93] contains four parameters: two fortranslation, one for rotation, and one for scaling:

X2

Y2

� �¼ s cos sin

� sin cos

� �X1

Y1

� �þ �X2

�Y2

� �; ð4Þ

where Xi

Yi

� �are the image frame coordinates at time ti,

�X2

�Y2

� �is the translation measured in the image coordinate system of

a frame at t2, is the rotation angle between the two frames,

and s is a scaling factor.This model was appropriate for image sequence of

distant scenes, where perspective distortion could beneglected. The motion parameters were computed bymatching a small number of feature points between twoframes. The extraction of feature points was done viasearching a predefined small window on the very topregion of the image using correlation. By constraining theselection of feature points to the very top region, featuresthat were too close to the camera could be avoided and,consequently, less distortion was introduced to the model.It should be mentioned that image stabilization methodswould fail when an image contains close scenes—acommon scenario when driving a vehicle in downtown orduring vehicle turns.

8.4 Critique of HV Methods

Constructing explicit models of the objects to be recognized isvery difficult when object appearance varies a lot. In general,appearance-based methods are more accurate than template-based methods, however, they are more costly due toclassifier training. Nevertheless, appearance-based methodsare becoming more and more popular due to the exponentialgrowth in processor speed. Analyzing the pros and cons ofvarious appearance-based methods proposed in the litera-ture is not simple. Most studies have been performed usingdifferent data sets and performance measures making a fairevaluation of different feature extraction methods andclassification schemes difficult if not impossible. In a recentstudy, experimental results were reported using severalfeature extraction methods (i.e., PCA features, waveletfeatures, and Gabor features) and classifiers (i.e., NeuralNetworks (NN) and Support Vector Machines (SVMs)) [95]).Testing was performed using a common data set obtained bydriving Ford’s concept vehicle under different traffic condi-tions (e.g., structured highway, complex urban streets, andvarying weather conditions). The best approach in terms ofaccuracy was found to be Gabor features with SVMs, yieldingan error rate of 5.33 percent with a false positives (FP) rate of3.46 percent and a false negatives (FN) rate of 1.88 percent.Combining Gabor and Harr wavelet features yielded aslightly better performance (i.e., an error rate of 3.89 percentwith a FP rate of 2.29 percent and a FN rate of 1.6 percent) atthe expense of higher time requirements. It should bementioned that the error rate using Haar wavelet featureswith SVMs was 8.52 percent, the FP rate was 6.50 percent and

the FN rate was 2.02 percent. More systematic evaluations ofvarious feature extraction methods and classificationschemes are required in order to assess the performance ofHV methods. In order for these comparisons to become moremeaningful, it is imperative to develop first representativedatasets (i.e., benchmarks) and carefully designed evaluationprocedures (i.e., see Section 9.6).

9 CHALLENGES AHEAD

An important issue in the realization of successful driverassistance applications is the design of vehicle detection andtracking systems that yield a maximum level of reliabilityand robustness in real-time. Although many efforts havebeen put into this research area, many algorithms andsystems have already been reported, many prototypevehicles have already been demonstrated, a highly robust,reliable, real-time system is yet to be demonstrated.Achieving these objectives requires addressing severalchallenges and solving quite different problems.

From a technical point of view, the success of an on-roadvehicle detection system will depend on the number ofcorrect detections versus the number of false alarms that itproduces, assuming a certain processing rate and a processorplatform. Determining the desired level of accuracy forvehicle detection is not easy and depends on the nature of theapplication. For example, if vehicle detection is part of awarning system, then higher false positive rates can betolerated. In contrast, systems involving active vehicle controlneed to be more conservative in terms of false alarms. Thereare a number of ways to significantly reduce the number offalse positives while keeping high accuracy includingimproved algorithmic solutions (e.g., using multiple cues,advanced statistical, and learning models), sensor fusion(e.g., visible, IR, and radar), and telematics (e.g., vehicle-to-vehicle communication and GPS-based localization). Weelaborate more on these issues next.

9.1 Algorithmic Advances

The design of computer vision algorithms that operaterobustly and reliably in complex and wide varying environ-ments (e.g., rain, fog, night, etc.) is a major challenge. Usingon-board cameras makes some well-established computervision techniques unsuitable (e.g., background subtraction isnot appropriate due to fast background changes caused bycamera motion) or not directly applicable unless makingcertain assumptions or adding enhancements (e.g., stereo-based systems require frequent recalibration to account forcamera movements caused by shocks and vibrations).Efficient implementations should also be considered (e.g.,fast motion-based estimation) in order to meet real-timeperformance requirements.

Developing more powerful algorithms to deal with avariety of issues is thus essential. In doing so, it is important tounderstand first the requirements of on-road vehicle detec-tion and design customized algorithmic solutions that meetthe requirements while taking advantage of domain knowl-edge and inherent constraints (e.g., exploiting temporalcontinuity to improve accuracy and robustness or assuminga flat road to simplify the mapping between image pixels andword coordinates). We have presented in Section 8 a numberof issues associated with HG approaches and potentialalgorithmic solutions to deal with these issues effectively.More efforts are clearly required in this direction.


In HV, most research efforts have focused on featureextraction and classification based on learning and statisticalmodels. Efforts in this direction should continue whilecapitalizing on recent advances in the statistical and machinelearning areas. For example, one issue that has not been givenenough attention in the vehicle detection literature is the issueof selecting a good set of features. In most cases, a largenumber of features is employed to compensate for the factthat relevant features are unknown a priori. However,without employing some kind of feature selection strategy,many of them would be either redundant or even irrelevantwhich could affect classification accuracy seriously. Ingeneral, it is highly desirable to use only those features thathave great separability power while ignoring or paying lessattention to the rest. For instance, to allow a vehicle detector togeneralize nicely, it would be nice to exclude featuresencoding fine details which might be present in some vehiclesonly. Finding out what feature to use for classification/recognition is referred to as feature selection. Recently, a fewefforts have been reported in the literature addressing thisissue in the context of vehicle detection [76], [96], [97], [98].Several efforts have even been reported to improve trackingthrough feature selection [99]. We believe that more effortsare required in this direction along with efforts to developmore powerful feature extraction and classification schemes.Recent advances in machine learning and statistics (e.g.,kernel methods [100] should be leveraged in this respect).

Combining multiple cues should also be explored moreactively as a viable means to develop more reliable androbust systems. The main motivation is that the use of asingle cue suitable for all conceivable scenarios seems to beimpossible. Combining different cues has produced pro-mising results (e.g., combining LOC, entropy, and shadow[44], shape, symmetry, and shadow [101], color and shape[102], and motion with appearance [103]). Effective fusionmechanisms as well as cues that are fast and easy tocompute are important research issues.

9.2 Sensor Advances

Employing more powerful sensors in vehicle detectionapplications can influence system performance consider-ably. Specific objectives include improving dynamic range,spectral sensitivity, spatial resolution, and incorporatingcomputational capabilities.

Traditional CCD cameras lack the dynamic range neces-sary to operate in traffic under adverse lighting conditions.Cameras with enhanced dynamic range are needed to enabledaytime and nighttime operation without blooming. Anexample is Ford’s proprietary low-light camera which hasbeen developed jointly between Ford Research Laboratoryand SENTECH. It uses a Sony x-view CCD array withspecifically designed electronic profiles to enhance thecamera’s dynamic range. Fig. 11a and Fig. 11c show thedynamic range of the low light camera, while Fig. 11b andFig. 11d show the same scene images caught under sameillumination conditions by using a normal camera. The low-light camera has been employed in a number of studiesincluding [77], [76], [104], [49], [96], [97]. Recently, severalefforts have focused on using CMOS technology to designcameras with improved dynamic range.

Low-light cameras do not extend visual capabilitiesbeyond the visible spectrum. In contrast, Infrared (IR) sensorsallow us to sense important information in the nonvisible

spectrum. IR-based systems are less sensitive to adverseweather or illumination changes—day and night snapshotsof the same scene are more similar to each other. Severalstudies have been carried out to evaluate the feasibility andadvantages of using IR for driver assistance system [105],[106], [107], [108]. An interesting example is the miniaturizedoptical range camera developed in the project MINORA [109],[110]. It works in the near-IR, it is cheap, fast, and capable ofproviding 3D information with high accuracy. However, ithas certain limitations such as low resolution and narrowfield of view. Fusing several sensors together could offerconsiderable performance improvements (see Section 9.3).

Improving camera resolution can offer significant bene-fits too. Over the last few years, the resolution of sensors hasbeen drastically enhanced. A critical issue in this case isdecreasing acquisition and transfer time. CMOS technologyholds some potential in this direction (i.e., pixels can beaddressed independently like in traditional memories).

In conventional vision systems, data processing takesplace at a host computer. Building cameras with internalprocessing power (i.e., vision chip) is an important goal. Themain idea is integrating photo-detectors with processors on avery large scale integration [111]. Vision chips have manyadvantages over conventional vision systems, for instance,high speed, small size, and lower power consumption, as wellas a wide brightness range, etc. Several cameras availabletoday allow to address and solve some basic problemsdirectly at the sensor level (e.g., image stabilization can nowbe performed during image acquisition).

9.3 Sensor Fusion

Developing driver assistance system suitable for urbanareas where traffic signs, crossings, traffic jams, and otherparticipants (motorbikes, bicycles, pedestrians, or even livestocks) may exist poses extra challenges. Exclusively


Fig. 11. Low light camera versus normal camera. (a) Low-light cameradaytime image. (b) Same scene caught using normal camera. (c) Low-light camera nighttime scene. (d) Same nighttime scene caught usingnormal camera.

vision-based systems and algorithms are not yet powerfulenough to deal with complex traffic situations. To extendthe application of a driver assistance systems, substantialresearch efforts are required to develop systems employinginformation from multiple sensors, both active andpassive, effectively (see Fig. 12).

Sensor characteristics reveal that each sensor can onlyperceive certain characteristics of the environment, there-fore, a single sensor is not sufficient enough to comprehen-sively represent the driving environment [67], [37]. Amultisensor approach has the potential to yield a higherlevel of reliability and security. Methods for sensor fusionand integration are concerned with improving the sensingcapacity by using redundant and complementary informa-tion from multiple sensors. These sensors are able to obtainmore accurate environment features that are impossible toperceive with a single sensor.

For example, acoustic sensors were fused with videosensors in [29] for both detection and tracking in order to takeadvantage of the complementary information available in thetwo sensors. In another study, a multisensor approach wasadopted using sensor technologies with widely overlappingfields of view between different sensors [112]. Depending onthe relevance of the area covered, they increased the degree ofvehicle is surveyed by means of a single laser sensor, the sidesare each covered by two independent laser scanners andseveral overlapping short range radar sensors, and the frontof the car is covered by three powerful long-range sensors(i.e., stereo-vision, laser, and radar). The sensor signals arecombined by sensor fusion into a joint obstacle map. Byconsidering confidence and reliability measures for eachsensor, the obstacle map computed by sensor fusion wasshown to be more precise and reliable than any of theindividual sensor outputs themselves.

Although sensor fusion has great potential to improvedriver assistance system, developing the actual multisensorplatform requires dealing with a series of problems includingnot only the conventional issues of sensor fusion andintegration, but also some special issues in driver assistancesystem design. With the common geometry and time frames,sensor fusion needs to be implemented at the various levels:

. Registration level. To allow fusing the data fromdifferent sensors effectively, sensor data needs to beregistered first.

. Encapsulation level. Registered data from differentsensors can be fused to yield more accurate informa-tion for the detected vehicles based on the reliability/confidence levels of the attributes associated with

different sensors. For instance, a more accurateposition-velocity could be obtained by analyzing theregistered radar and stereo-vision data. In otherwords, at this level, same type of information isencapsulated together to a more accurate and conciserepresentation.

. Perception-map level. Complementary informationcan be fused to infer new knowledge about thedriving environment. The position-velocity informa-tion of detected vehicles and road geometry infor-mation (from vision) can be fused to produce aprimary perception map, where vehicles can becharacterized as being either stationary/moving orinside/outside the lane.

. Threat quantification level. Vehicle type, shape, dis-tance, and speed information can be fused toquantify the threat level of a vehicle in the percep-tion map to the host vehicle.

Most vehicle detection approaches have been implemen-ted as “autonomous systems” with all instrumentation andintelligence on-board the vehicle. Significant performanceimprovements can be expected, however, by implementingvehicle detection as “co-operative systems” where assistanceis provided from external sources (i.e., the roadway, or fromother vehicles, or both). Examples of roadway assistanceinclude passive reference markers in the infrastructure andGPS-based localization. Vehicle-to-vehicle co-operationworks by transmitting key vehicle parameters and intentionsto close-by vehicles. Having this information available as wellas knowing the type of surrounding environment throughGPS might reduce the complexity of the problem and makevehicle detection more reliable and robust.

9.4 Software Issues

Vision-based vehicle detection systems should be modular,reconfigurable, and extensible to be able to deal with a widevariety of image processing tasks. The functionality of mostvehicle detection systems today is achieved by a fewalgorithms that are hard-wired together. This is quiteinefficient and cannot handle satisfactorily the complexitiesinvolved. Recently, there have been some efforts to develop asoftware architecture that can deal with different levels ofabstraction including sensor fusion, integration of variousalgorithms, economical use of resources, scalability, anddistributed computing. For example, a multiagent-systemapproach was proposed in [13] (i.e., ANTS or AgentNeTwork System) to address these issues. In another study[85], a hard real-time operating system called “Maruti” wasused to guarantee that the timing constraints on the variousvision processes were satisfied. The dynamic creation andtermination of tracking processes optimized the amount ofcomputational resources spent and allowed fast detectionand tracking of multiple cars. Obviously, more efforts in thisarea would be essential.

9.5 Hardware Issues

On-board vehicle detection systems have high computa-tional requirements as they need to process the acquiredimages at real-time or close to real-time to save time fordriver reaction. For nontrivial velocities of the vehicle,processing latency should be small (i.e., typically no largerthan 100 ms), while processing frequency should be high(i.e., typical in excess of 15 frames per second). Due to the


Fig. 12. An example of sensor fusion.

constrains of low latency and the difficulty in sending andreceiving video data reliably, most image processing mustbe done no site on the vehicle.

Computer vision algorithms are generally very computa-tionally intensive and require powerful resources in order tocomply with the real-time performance constraints. Withincreasing computing power of standard PCs, severalsystems have been demonstrated using general purposehardware. For example, our group has developed a vehicledetection system that works at a frame rate of approximately10frameper second(NTSC:processingonaverageeverythirdframe) using a standard PC machine (Pentium III 1133MHZ)[49]. Although we expect the development of more powerful,low-cost, general-purpose processors in the near future,specialized hardwaresolutionsusingoff-the-self components(e.g., cluster of PCs) seems to be the way to go at present.

Vehicle detection for precrash sensing requires highenough sampling rate in order to provide a satisfactorysolution. If the vehicle’s speed is about 70 mph, then 10Hzcorresponds to a 3 meter interval. The most time consumingstep in our system is the computation of the vertical/horizontal edges. Most low-level image-processing algo-rithms employed for vehicle detection perform similarcomputations for all the pixels of an image and require onlylocal information. Therefore, substantial speed-ups can beachieved by implementing them on appropriate hardware.Specialized hardware solutions are possible using low-costgeneral-purpose processors and Field Programmable GateArrays (FPGAs).

Recent advances in computation hardware allow us tohave systems that can deliver high computational power,with fast networking facilities, at an affordable price.Several studies have taken advantage of hardware imple-mentations to speed-up computations including edge-basedmotion detection [113], hardware-based optical flow esti-mation [114], object tracking [115], as well as featuredetection and point tracking [116]. Sarnoff has alsodeveloped a powerful image processing platform calledVFE-200 [117]. VFE-200 can perform several front-endvision functions in hardware simultaneously at video rates(e.g., pyramids, registration, and optical flow). It is worthmentioning that, most of the hardware implementationsappeared in the literature have addressed smaller problems(e.g., motion detection, edge detection, etc.). Integrating allthose hardware components together, as well as integratinghardware and software implementations seamlessly, re-quires more effort.

9.6 Benchmarks and Evaluation Methodology

The majority of vehicle detection systems reported in theliterature have not been tested under realistic conditions(e.g., different traffic scenarios including simply structuredhighway, complex urban street, and varying weatherconditions). Moreover, evaluations are based on differentdata sets and performance measures, making comparisonsbetween systems very difficult. Future efforts should focuson assessing system performance along a real collision time-line, taking into account driver perception-response times,braking rates, and various collision scenarios.

The field is lacking representative data sets (i.e., bench-marks) and specific procedures to allow comprehensivesystem evaluations and fair comparisons between differentsystem. To move things forward, exemplary strategies

developed in related fields (e.g., face recognition [118], andsurveillance [119]) should be adapted in order to develop andmake available to the broader scientific community bench-marks and carefully designed evaluation procedures toenable performance evaluations in a consistent way. Relatinglevel of performance in terms of complexity of driving scene isalso of critical importance. Ideas developed in related fields(e.g., object recognition [120]) should be adapted to allowmore effective designs and meaningful evaluations.

9.7 Failure Detection

An on-board vision sensor will face adverse operatingconditions, and it may reach a point where it might not beable to provide good quality data to meet minimum systemperformance requirements. In these cases, the driver assis-tance system may not be able to fulfill its desired responsi-bilities correctly (e.g., issuing severe false alerts). A reliabledriver assistance system should be able to evaluate itsperformance and disable its operation when it cannot providereliable traffic information any more. We refer to this functionas “failure detection.” One possible option for failuredetection is to use another sensor exclusively for this purpose,at the expense of additional cost. A better method might beextracting information for failure detection from the visionsensor. Some preliminary experiments have been reported inthe scenario of distance detection using stereo vision [121],where the host vehicle and subject were both stationary.Further exploration of this issue is yet to be carried out.

10 CONCLUSIONS

We have presented a survey of vision-based on-road vehicledetection systems—one of the most important componentsof any driver assistance system. On-road vehicle detectionusing optical sensors is very challenging and many practicalissues must be considered. Depending on the range ofinterest, different methods seem to be more appropriate. InHG, stereo-based methods have gained popularity but theysuffer from a number of practical issues not found in typicalapplications. Edge-based methods, although much simpler,are quite effective but they are not appropriate for distantvehicles. In HV, appearance-based methods are morepromising but recent advances in machine and statisticallearning need to be leveraged. Fusing data from multiplecues and sensors should be explored more actively in orderto improve robustness and reliability. A great deal of workshould also be directed toward the enhancement of sensorcapabilities and performance including the improvement ofgain control and sensitivity in extreme illumination condi-tions. Hardware-based solutions using off-the-self compo-nents should also be explored to meet real-time constraintswhile keeping cost low.

Although we have witnessed the introduction of the firstvision products on board vehicles in the automobileindustry (e.g., the Lane Departure Warning System avail-able in Mercedes and Freightliner’s trucks [7]), we believethat the introduction of vision-based systems in theautomobile industry is still several years away. In ourperspective, the future holds promise for driver assistancesystems that can be tailored to solve well-defined tasks thatattempt to support, not replace the driver. Even though,several orders of improvement in sensor performance and


algorithm robustness are needed before these systems can

be deployed effectively.In spite of the technical challenges that lie ahead, we

believe that some degree of optimism is justifiable based on

the progress that this domain has seen over the last few

years. Judging from the research activities in this field

worldwide, it is certain that it will continue to be among the

hottest research areas in the future. Major motor companies,

government agencies, and universities, are all expected to

work together to make significant progress in this area over

the next few years. Rapidly falling costs for the sensors and

processors combined with increasing image resolution

provides the basis for a continuous growth of this field.

ACKNOWLEDGMENTS

This research was supported by Ford Motor Company

under grant No. 2001332R, the University of Nevada, Reno,

under an Applied Research Initiative (ARI) grant, and in

part by the US National Science Foundation under CRCD

grant No. 0088086.

REFERENCES

[1] W. Jones, “Keeping Cars from Crashing,” IEEE Spectrum, vol. 38,no 9, pp. 40-45, 2001.

[2] W. Jones, “Building Safer Cars,” IEEE Spectrum, vol. 39, no. 1, pp. 82-85, 2002.

[3] E. Dickmanns, “The Development of Machine Vision For RoadVehicles in the Last Decade,” Proc. IEEE Intelligent Vehicle Symp.,2002.

[4] M. Bertozzi, A. Broggi, and A. Fascioli, “Vision-Based intelligentVehicles: State of the Art and Perspectives,” Robotics andAutonomous Systems, vol. 32, pp. 1-16, 2000.

[5] M. Bertozzi, A. Broggi, M. Cellario, A. Fascioli, P. Lombardi, andM. Porta, “Artifical Vision in Road Vehicles,” Proc. IEEE, vol. 90,no. 7, pp. 1258-1271, 2002.

[6] R. Bishop, “Intelligent Vehicle Applications Worldwide,” IEEEIntelligent Systems, 2000.

[7] D. Gavrila and U. Franke, and C. Wohler, and S. Gorzig, “Real-Time Vision for Intelligent Vehicles,” IEEE Instrumentation andMeasurement Magazine, pp. 22-27, 2001.

[8] IEEE Trans. Intelligent Transportation Systems, special issue onVision Applications and Technology for Intelligent Vehicles:Part I—Infrastructure, vol. 1, 2000.

[9] IEEE Trans. Intelligent Transportation Systems, special issue onVision Applications and Technology for Intelligent Vehicles:Part II—Vehicles, vol. 1, 2000.

[10] Image and Vision Computing, special issue on applications ofcomputer vision to intelligent vehicles, vol. 18, 2000.

[11] IEEE Trans. Vehicular Technology, special issue on in-vehiclecomputer vision systems, vol. 53, 2004.

[12] F. Heimes and H. Nagel, “Towards Active Machine-Vision-BasedDriver Assistance for Urban Areas,” Int’l J. Computer Vision,vol. 50, pp. 5-34, 2002.

[13] U. Franke et al., From Door to Door—Principles and Applications ofComputer Vision for Driver Assistant Systems. Butterworth Heine-mann, 2001.

[14] M. Schwarzinger, T. Zielke, D. Noll, M. Brauckmann, and W. vonSeelen, “Vision-Based Car-Following: Detection, Tracking, andIdentification,” Proc. IEEE Intelligent Vehicle Symp., pp. 24-29, 1992.

[15] B. Ulmer, “Vita—An Autonomous Road Vehicle (ARV) forCollision Avoidance in Traffic,” Proc. IEEE Intelligent VehiclesSymp., pp. 36-41, 1992.

[16] M. Bertozzi and A. Broggi, “Gold: A Parallel Real-Time StereoVision System for Generic Obstacle and Lane Detection,” IEEETrans. Image Processing, vol. 7, pp. 62-81, 1998.

[17] M. Bertozzi, A. Broggi, and A. Fascioli, “Obstacle and LaneDetection on Argo Autonomous Vehicle,” IEEE Intelligent Trans-portation Systems, 1997.

[18] S. Tsugawa and Sadayuki, “Vision-Based Vehicle on Japan:Machine Vision Systems and Driving Control Systems,” IEEETrans. Industrial Electronics, vol. 41, no. 4, pp. 398-405, 1994.

[19] Vehicle-Highway Automation Activities in the United States. US Dept.of Transportation, 1997.

[20] C. Thorpe, J.D. Carlson, D. Duggins, J. Gowdy, R. MacLachlan, C.Mertz, A. Suppe, and C. Wan, “Safe Robot Driving in ClutteredEnvironments,” Proc. 11th Int’l Symp. Robotics Research, 2003.

[21] D. Pomerleau and T. Jochem, “Rapidly Adapting Machine Visionfor Automated Vehicle Steering,” IEEE Expert, vol. 11, pp. 19-27,1996.

[22] C. Thorpe and T. Kanade, “Vision and Navigation for theCarnegie Mellon Navlab,” Proc. DARPA Image UnderstandingWorkshop, 1985.

[23] C. Thorpe, M. Hebert, T. Kanade, and S. Shafer, “Vision andNavigation for the Carnegie-Mellon Navlab,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 10, pp. 362-373, 1988.

[24] M. Walton, “Robots Fail to Complete Grand Challenge,” CNNNews, Mar. 2004.

[25] M. Herbert, “Active and Passive Range Sensing for Robotics,”Proc. IEEE Int’l Conf. Robotics and Automation, vol. 1, pp. 102-110,2000.

[26] S. Park, T. Kim, S. Kang, and K. Heon, “A Novel Signal ProcessingTechnique for Vehicle Detection Radar,” 2003 IEEE MTT-S Int’lMicrowave Symp. Digest, pp. 607-610, 2003.

[27] C. Wang, C. Thorpe, and A. Suppe, “Ladar-Based Detection andTracking of Moving Objects from a Ground Vehicle at HighSpeeds,” Proc. IEEE Intelligent Vehicles Symp., 2003.

[28] J. Hancock, E. Hoffman, R. Sullivan, D. Ingimarson, D. Langer,and M. Hebert, “High-Performance Laser Ranger Scanner,” Proc.SPIE Int’l Conf. Intelligent Transportation Systems, 1997.

[29] R. Chellappa, G. Qian, and Q. Zheng, “Vehicle Detection andTracking Using Acoustic and Video Sensors,” Proc. IEEE Int’l Conf.Acoustics, Speech, and Signal Processing, pp. 793-796, 2004.

[30] H. Mallot, H. Bulthoff, J. Little, and S. Bohrer, “Inverse PerspectiveMapping Simplifies Optical Flow Computation and ObstacleDetection,” Biological Cybernetics, vol. 64, no. 3, pp. 177-185, 1991.

[31] G. Marola, “Using Symmetry for Detecting and Locating Objectsin a Picture,” Computer Vision, Graphics, and Image Processing,vol. 46, pp. 179-195, 1989.

[32] A. Kuehnle, “Symmetry-Based Recognition for Vehicle Rears,”Pattern Recognition Letters, vol. 12, pp. 249-258, 1991.

[33] T. Zielke, M. Brauckmann, and W. von Seelen, “Intensity and Edge-Based Symmetry Detection with an Application to Car-Following,”CVGIP: Image Understanding, vol. 58, pp. 177-190, 1993.

[34] W. von Seelen, C. Curio, J. Gayko, U. Handmann, and T. Kalinke,“Scene Analysis and Organization of Behavior in Driver AssistanceSystems,” Proc. IEEE Int’l Conf. Image Processing, pp. 524-527, 2000.

[35] J. Crisman and C. Thorpe, “Color Vision for Road Following,”Proc. SPIE Conf. Mobile Robots, pp. 246-249, 1988.

[36] S.D. Buluswar and B.A. Draper, “Color Machine Vision forAutonomous Vehicles,” Int’l J. Eng. Applications of ArtificialIntelligence, vol. 1, no. 2, pp. 245-256, 1998.

[37] D. Guo, T. Fraichard, M. Xie, and C. Laugier, “Color Modeling bySpherical Influence Field in Sensing Driving Environment,” Proc.IEEE Intelligent Vehicle Symp., pp. 249-254, 2000.

[38] H. Mori and N. Charkai, “Shadow and Rhythm as Sign Patterns ofObstacle Detection,” Proc. Int’l Symp. Industrial Electronics, pp. 271-277, 1993.

[39] E. Dickmanns et al., “The Seeing Passenger Car ‘Vamors-P’,” Proc.Int’l Symp. Intelligent Vehicles, pp. 24-26, 1994.

[40] C. Tzomakas and W. Seelen, “Vehicle Detection in Traffic ScenesUsing Shadows,” Technical Report 98-06, Institut fur Neuroinfor-matik, Ruht-Universitat, Bochum, Germany, 1998.

[41] M. Bertozzi, A. Broggi, and S. Castelluccio, “A Real-Time OrientedSystem for Vehicle Detection,” J. Systems Architecture, pp. 317-325,1997.

[42] N. Matthews, P. An, D. Charnley, and C. Harris, “VehicleDetection and Recognition in Greyscale Imagery,” Control Eng.Practice, vol. 4, pp. 473-479, 1996.

[43] C. Goerick, N. Detlev, and M. Werner, “Artificial NeuralNetworks in Real-Time Car Detection and Tracking Applica-tions,” Pattern Recognition Letters, vol. 17, pp. 335-343, 1996.

[44] U. Handmann, T. Kalinke, C. Tzomakas, M. Werner, and W.Seelen, “An Image Processing System for Driver Assistance,”Image and Vision Computing, vol. 18, no. 5, 2000.


[45] P. Parodi and G. Piccioli, “A Feature-Based Recognition Schemefor Traffic Scenes,” Proc. IEEE Intelligent Vehicles Symp., pp. 229-234, 1995.

[46] M. Betke, E. Haritaglu, and L. Davis, “Real-Time Multiple VehicleDetection and Tracking from a Moving Vehicle,” Machine Visionand Applications, vol. 12, no. 2, 2000.

[47] N. Srinivasa, “A Vision-Based Vehicle Detection and TrackingMethod for Forward Collision Warning,” Proc. IEEE IntelligentVehicle Symp., pp. 626-631, 2002.

[48] T. Bucher, C. Curio, J. Edelbrunner, C. Igel, D. Kastrup, I. Leefken,G. Lorenz, A. Steinhage, and W. von Seelen, “Image Processingand Behavior Planning for Intelligent Vehicles,” IEEE Trans.Industrial Electronics, vol. 50, no. 1, pp. 62-75, 2003.

[49] Z. Sun, R. Miller, G. Bebis, and D. DiMeo, “A Real-Time PrecrashVehicle Detection System,” Proc. IEEE Int’l Workshop Application ofComputer Vision, Dec. 2002.

[50] T. Kalinke, C. Tzomakas, and W. von Seelen, “A Texture-BasedObject Detection and an Adaptive Model-Based Classification,”Proc. IEEE Int’l Conf. Intelligent Vehicles, pp. 143-148, 1998.

[51] R. Haralick, B. Shanmugam, and I. Dinstein, “Texture Features forImage Classification,” IEEE Trans. System, Man, and Cybernetics,pp. 610-621, 1973.

[52] R. Cucchiara and M. Piccardi, “Vehicle Detection under Day andNight Illumination,” Proc. Int’l ICSC Symp. Intelligent IndustrialAutomation, 1999.

[53] R. Mandelbaum, L. McDowell, L. Bogoni, B. Beich, and M.Hansen, “Real-Time Stereo Processing, Obstacle Detection, andTerrain Estimation from Vehicle-Mounted Stereo Cameras,” Proc.IEEE Workshop Applications of Computer Vision, pp. 288-289, 1998.

[54] U. Franke and I. Kutzbach, “Fast Stereo Based Object Detection forStop and Go Traffic,” Intelligent Vehicles, pp. 339-344, 1996.

[55] G. Zhao and S. Yuta, “Obstacle Detection by Vision System forAutonomous Vehicle,” Intelligent Vehicles, pp. 31-36, 1993.

[56] M. Bertozzi and A. Broggi, “Vision-Based Vehicle Guidance,”Computer, vol. 30, no 7, pp. 49-55, July 1997.

[57] C. Knoeppel, A. Schanz, and B. Michaelis, “Robust VehicleDetection at Large Distance Using Low Resolution Cameras,”Proc. IEEE Intelligent Vehicle Symp., pp. 267-272, 2000.

[58] M. Okutomi and T. Kanade, “A Multiple-Baseline Stereo,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 15, pp. 353-363,1993.

[59] T. Williamson and C. Thorpe, “A Trinocular Stereo System forHighway Obstacle Detection,” Proc. IEEE Int’l Conf. Robotics andAutomation, 1999.

[60] A. Giachetti, M. Campani, and V. Torre, “The Use of Optical Flowfor Road Navigation,” IEEE Trans. Robotics and Automation, vol. 14,no. 1, pp. 34-48, 1998.

[61] D. Fleet, J. Barron, and S. Beauchemin, “Performance of OpticalFlow,” Int’l J. Computer Vision, vol. 12, pp. 43-77, 1994.

[62] W. Kruger, W. Enkelmann, and S. Rossle, “Real-Time Estimationand Tracking of Optical Flow Vectors for Obstacle Detection,”Proc. IEEE Intelligent Vehicle Symp., pp. 304-309, 1995.

[63] J. Weng, N. Ahuja, and T. Huang, “Matching Two PerspectiveViews,” IEEE Trans. Pattern Analysis and Machine intelligence,vol. 14, pp. 806-825, 1992.

[64] S. Smith and J. Brady, ASSET-2: Real-Time Motion Segmentation andShape Tracking, vol. 17, 1995.

[65] D. Koller, N. Heinze, and H. Nagel, “Algorithmic Characterizationof Vehicle Trajectories from Image Sequence by Motion Verbs,”Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition,pp. 90-95, 1991.

[66] B. Heisele and W. Ritter, “Obstacle Detection Based on Color BlobFlow,” Proc. IEEE Intelligent Vehicle Symp., pp. 282-286, 1995.

[67] T. Ito, K. Yamada, and K. Nishioka, “Understanding DrivingSituations Using a Network Model,” Intelligent Vehicles, pp. 48-53,1995.

[68] U. Regensburger and V. Graefe, “Visual Recognition of Obstacleson Roads,” Intelligent Robots and Systems, pp. 73-86, 1995.

[69] V. Graefe and U. Regensburger, “A Novel Approach for theDetection of Vehicle on Freeways by Real-Time Vision,” IntelligentVehicles, 1996.

[70] A. Bensrhair, M. Bertozzi, A. Broggi, P. Miche, S. Mousset, and G.Moulminet, “A Cooperative Approach to Vision-Based VehicleDetection,” IEEE Intelligent Transportation Systems, pp. 209-214,2001.

[71] J. Wu and X. Zhang, “A PCA Classifier and Its Application inVehicle Detection,” Proc. IEEE Int’l Joint Conf. Neural Networks,2001.

[72] D. Noli, M. Werner, and W. von Seelen, “Real-Time VehicleTracking and Classification,” Intelligent Vehicles, pp. 101-106, 1995.

[73] H. Schneiderman and T. Kanade, “A Statistical Method for 3DObject Detection Applied to Faces and Cars,” Proc. IEEE Int’l Conf.Computer Vision and Pattern Recognition, pp. 746-751, 2000.

[74] M. Weber, M. Welling, and P. Perona, “Unsupervised Learning ofModels for Recognition,” Proc. European Conf. Computer Vision,pp. 18-32, 2000.

[75] C. Papageorgiou and T. Poggio, “A Trainable System for ObjectDetection,” Int’l J. Computer Vision, vol. 38, no 1, pp. 15-33, 2000.

[76] Z. Sun, G. Bebis, and R. Miller, “Quantized Wavelet Features andSupport Vector Machines for On-Road Vehicle Detection,” Proc.IEEE Int’l Conf. Control, Automation, Robotics, and Vision, Dec. 2002.

[77] Z. Sun, G. Bebis, and R. Miller, “On-Road Vehicle Detection UsingGabor Filters and Support Vector Machines,” Proc. IEEE Int’l Conf’Digital Signal Processing, July 2002.

[78] S. Agarwal and D. Roth, “Learning a Sparse Representation forObject Detection,” Proc. European Conf. Computer Vision, 2002.

[79] B. Leung, “Component-Based Car Detection in Street SceneImages,” technical report, Massachusetts Inst. of Technology, 2004.

[80] D. Lowe, “Object Recognition from Local Scale-Invariant Fea-tures,” Proc. Int’l Conf. Computer Vision, 1999.

[81] J. Ferryman, S. Maybank, and A. Worrall, “Visual Surveillance forMoving Vehicles,” Proc. IEEE Workshop Visual Surveillance, pp. 73-80, 1998.

[82] A. Rajagopalan and R. Chellappa, “Vehicle Detection andTracking in Video,” Proc. IEEE Int’l Conf. Image Processing,pp. 351-354, 2000.

[83] X. Clady, F. Collange, F. Jurie, and P. Martinet, “Cars Detectionand Tracking with a Vision Sensor,” Proc. IEEE Intelligent VehicleSymp., pp. 593-598, 2003.

[84] X. Li, X. Yao, Y. Murphey, R. Karlsen, and G. Gerhart, “A RealTime Vehicle Detection and Tracking System in Outdoor TrafficScenes,” Proc. Int’l Conf. Pattern Recognition, 2004.

[85] M. Betke, E. Haritaglu, and L. Davis, “Multiple Vehicle Detectionand Tracking in Hard Real Time,” Proc. IEEE Intelligent VehiclesSymp., pp. 351-356, 1996.

[86] G. Lefaix, T. Marchand, and P. Bouthemy, “Motion-BasedObstacle Detection and Tracking for Car Driving Assistance,”Proc. 16th Int’l Conf. Pattern Recognition, pp. 74-77, 2002.

[87] C. Hoffman, T. Dang, and C. Stiller, “Vehicle Detection Fusing 2DVisual Features,” Proc. IEEE Intelligent Vehicles Symp., pp. 280-285,2004.

[88] H. Gish and R. Mucci, “Weak Target Detection Using the EntropyConcept,” Proc. SPIE, vol. 1305 1990.

[89] S. Avidan, “Support Vector Tracking,” Proc. IEEE CS Conf.Computer Vision and Pattern Recognition, 2001.

[90] L. Fletcher, L. Peterson, and A. Zelinsky, “Driver AssistanceSystems Based on Vision In and Out of Vehicles,” Proc. IEEEIntelligent Vehicles Symp. (IV2003), 2003.

[91] M. Suwa, Y. Wu, M. Kobayashi, and S. Ogata, “A Stereo-BasedVehicle Detection Method under Windy Conditions,” Proc. IEEEIntelligent Vehicle Symp., pp. 246-249, 2000.

[92] M. Bertozzi, A. Broggi, and A. Fascioli, “Self-Calibration of aStereo Vision System for Automotive Applications,” Proc. IEEEInt’l Conf. Robotics and Automation, pp. 3698-3703, 2001.

[93] C. Morimoto, D. Dementhon, L. Davis, R. Chellappa, and R.Nelson, “Detection of Independently Moving Object in PassiveVideo,” Proc. IEEE Intelligent Vehicle Symp., pp. 270-275, 1995.

[94] Y. Liang, H. Tyan, S. Chang, H. Liao, and S. Chen, “VideoStabilization for a Camcorder Mounted on a Moving Vehicle,”IEEE Trans. Vehicular Technology, vol. 53, 2004.

[95] Z. Sun, G. Bebis, and R. Miller, “On-Road Vehicle Detection UsingEvolutionary Gabor Filter Optimization,” IEEE Trans. IntelligentTransportation Systems, vol. 6, no. 2, pp. 125-137, 2005.

[96] Z. Sun, G. Bebis, and R. Miller, “Boosting Object Detection UsingFeature Selection,” Proc. IEEE Int’l Conf. Advanced Video and SignalBased Surveillance, 2003.

[97] Z. Sun, G. Bebis, and R. Miller, “Evolutionary Gabor FilterOptimization with Application to Vehicle Detection,” Proc. IEEEInt’l Conf. Data Mining, 2003.

[98] Z. Sun, G. Bebis, and R. Miller, “Object Detection Using FeatureSubset Selection,” Pattern Recognition, vol. 37, pp. 2165-2176, 2004.


[99] R. Collins and Y. Liu, “On-Line Selection of Discriminant TrackingFeatures,” Proc. IEEE Int’l Conf. Computer Vision, 2003.

[100] J. Taylor and N. Cristianini, Kernel Methods for Pattern Analysis.Cambridge Univ. Press, 2004.

[101] J. Collado, C. Hilario, A. de la Escalera, and J. Armingol, “Model-Based Vehicle Detection for Intelligent Vehicles,” Proc. IEEEIntelligent Vehicles Symp., 2004.

[102] K. She, G. Bebis, H. Gu, and R. Miller, “Vehicle Tracking UsingOn-Line Fusion of Color and Shape Features,” Proc. IEEE Int’lConf. Intelligent Transportation Systems, 2004.

[103] J. Wang, G. Bebis, and R. Miller, “Overtaking Vehicle DetectionUsing Dynamic and Quasi-Static Background Modeling,” Proc.IEEE Workshop Machine Vision for Intelligent Vehicles, 2005.

[104] Z. Sun, G. Bebis, and R. Miller, “Improving the Performance ofOn-Road Vehicle Detection by Combining Gabor and WaveletFeatures,” Proc. IEEE Fifth Int’l Conf. Intelligent TransportationSystems, Sept. 2002.

[105] L. andreone, P. Antonello, M. Bertozzi, A. Broggi, A. Fascioli, andD. Ranzato, “Vehicle Detection and Localization in Infra-RedImages,” Proc. IEEE Fifth Int’l Conf. Intelligent TransportationSystems, 2002.

[106] P. Barham, X. Zhang, L. Andreone, and M. Vache, “TheDevelopment of a Driver Vision Support System Using FarInfrared Technology: Progress to Date on the Darwin Project,”Proc. IEEE Intelligent Vehicles Symp., 2000.

[107] D. Pomerleau and T. Jochem, “Automatic Vehicle Detection inInfrared Imagery Using a Fuzzy Inference-Based ClassificationSystem,” IEEE Trans. Fuzzy Systems, vol. 9, no. 1, pp. 53-61, 2001.

[108] M. Kagesawa, S. Ueno, K. Ikeuchi, H. Kashiwagi, “RecognizingVehicles in Infrared Images Using IMAP Parallel Vision Board,”IEEE Trans. Intelligent Transportation Systems, vol. 2, no. 1, pp. 10-17, 2001.

[109] K. Sobottka, E. Meier, F. Ade, and H. Bunke, “Towards SmarterCars,” Proc. Int’l Workshop Sensor Based Intelligent Robots, pp. 120-139, 1999.

[110] T. Spirig, M. Marle, and P. Seitz, “The MultiTap Lock-In CCD withOffset Subtraction,” IEEE Trans. Electron Devices, pp. 1643-1647,1997.

[111] K. Yamada, “A Compact Integrated Vision Motion Sensor for ItsApplications,” IEEE Trans. Intelligent Transportation System, vol. 4,no. 1, pp. 35-41, 2003.

[112] C. Stiller, J. Hipp, C. Rossig, and A. Ewald, “Multisensor ObstacleDetection and Tracking,” Image and Vision Computing, vol. 18,pp. 389-396, 2000.

[113] Y. Fang, M. Mizuki, I. Masaki, and B. Horn, “TV Camera-BasedVehicle Motion Detection and Its Chip Implementation,” Proc.IEEE Intelligent Vehicle Symp., pp. 134-139, 2000.

[114] W. Enkelmann, V. Gengenbach, and W. Kruger, “ObstacleDetection by Real-Time Optical Flow Evaluation,” Proc. IEEEIntelligent Vehicle Symp., pp. 97-102, 1994.

[115] S. Ghiasi, H. Moon, and M. Sarrafzadeh, “Collaborative andReconfigurable Object Tracking,” Proc. Int’l Conf. Eng. of Reconfi-gurable Systems and Algorithms, pp. 13-20, 2003.

[116] A. Benedetti and P. Perona, “Real-Time 2-D Feature Detection on aReconfigurable Computer,” Proc. IEEE Conf. Computer Vision andPattern Recognition, 1998.

[117] R. Mandelbaum, M. Hansen, P. Burt, and S. Baten, “Vison forAutonomous Mobilty: Image Processing on the Vfe-200,” Proc.IEEE Int’l Symp. Intelligent Control, 1998.

[118] J. Phillips, H. Moon, S. Rizvi, and J. Rauss, “The Feret EvaluationMethodology for Face-Recognition Algorithms,” IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 22, pp. 1090-1104,2000.

[119] “Real-Time Event Detection Solutions (CREDS) for EnhancedSecurity and Safety in Public Transportation Challenge,” Proc.IEEE Int’l Conf. Advanced Video and Signal Based Surveillance, 2005.

[120] M. Boshra and B. Bhanu, “Predicting Object RecognitionPerformance under Data Uncertainty, Occlusion and Clutter,”Proc. IEEE Int’l Conf. Image Processing, 1998.

[121] M. Nishigaki, M. Saka, T. Aoki, H. Yuhara, and M. Kawai, “FailOuput Algorithm of Vision Sensing,” Proc. IEEE Intelligent VehicleSymp., pp. 581-584, 2000.

Zehang Sun received the BEng degree intelecommunication and the MEng degree indigital signal processing from Northern JiaotongUniversity, China, in 1994 and 1997, respec-tively, the MEng degree in electrical andelectronic engineering from Nanyang Technolo-gical University, Singapore, in 1999, and thePhD degree in computer science and engineer-ing from the University of Nevada, Reno, in2003. Dr. Sun joined eTreppid Technologies,

LLC immediately upon his graduation. His expertise is in the area of real-time computer vision systems, statistical pattern recognition, artificialintelligence, digital signal processing, and embedded systems. He is amember of the IEEE.

George Bebis received the BS degree inmathematics and the MS degree in computerscience from the University of Crete, Greece,in 1987 and 1991, respectively, and thePhD degree in electrical and computer engineer-ing from the University of Central Florida,Orlando, in 1996. Currently, he is an associateprofessor in the Department of ComputerScience and Engineering at the University ofNevada, Reno (UNR) and the director of the

UNR Computer Vision Laboratory (CVL). His research interests includecomputer vision, image processing, pattern recognition, machinelearning, and evolutionary computing. His research is currently fundedby US NSF, NASA, ONR, and the Ford Motor Company. Dr. Bebis is anassociate editor of the Machine Vision and Applications Journal andserves on the editorial board of the Pattern Recognition Journal and theInternational Journal Artificial Intelligence Tools. He has served on theprogram committees of various national and international conferences,and has organized and chaired several conference sessions. In 2002,he received the Lemelson Award for innovation and Entrepreneurship.He is a member of the IEEE and the IAPR Educational Committee.

Ronald Miller received the BS degree in physicsin 1983 from the University of Massachusetts,and the PhD degree in physics from the Massa-chusetts Institute of Technology in 1988. Hisresearch has ranged from computational model-ing of plasma and ionospheric instabilities toautomotive safety applications. Dr. Miller heads aresearch program at Ford Motor Company inintelligent vehicle technologies focusing on ad-vanced RF communication, radar, and optical

sensing systems for accident avoidance and telematics.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


Documents

694 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …bebis/VelDetReviewPAMI.pdf · 2006-03-23 · illumination conditions (Fig. 1c), unpredictable interaction between traffic participants,