Fusion Framework for Moving-Object Classification

HAL Id: hal-00881147https://hal.archives-ouvertes.fr/hal-00881147

Submitted on 7 Nov 2013

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Fusion Framework for Moving-Object ClassificationRicardo Omar Chavez-Garcia, Trung-Dung Vu, Olivier Aycard, Fabio Tango

To cite this version:Ricardo Omar Chavez-Garcia, Trung-Dung Vu, Olivier Aycard, Fabio Tango. Fusion Framework forMoving-Object Classification. Information Fusion (FUSION), 2013 16th International Conference on,Jul 2013, Istanbul, Turkey. pp.1159-1166. hal-00881147

https://hal.archives-ouvertes.fr/hal-00881147

https://hal.archives-ouvertes.fr

Fusion Framework for Moving-Object ClassificationR. Omar Chavez-Garcia∗, Trung-Dung Vu∗, Olivier Aycard∗ and Fabio Tango†

∗University of Grenoble1, Grenoble - FranceEmail: ricardo.chavez-garcia, trung-dung.vu, [email protected]

†Centro Ricerche Fiat, Orbassano - ItalyEmail: [email protected]

Abstract—Perceiving the environment is a fundamental task forAdvance Driver Assistant Systems. While simultaneous localiza-tion and mapping represents the static part of the environment,detection and tracking of moving objects aims at identifyingthe dynamic part. Knowing the class of the moving objectssurrounding the vehicle is a very useful information to correctlyreason, decide and act according to each class of object, e.g.car, truck, pedestrian, bike, etc. Active and passive sensorsprovide useful information to classify certain kind of objects,but perform poorly for others. In this paper we present ageneric fusion framework based on Dempster-Shafer theoryto represent and combine evidence from several sources. Weapply the proposed method to the problem of moving objectclassification. The method combines information from severallists of moving objects provided by different sensor-based objectdetectors. The fusion approach includes uncertainty from thereliability of the sensors and their precision to classify specifictypes of objects. The proposed approach takes into accountthe instantaneous information at current time and combines itwith fused information from previous times. Several experimentswere conducted in highway and urban scenarios using a vehicledemonstrator from the interactIVe European project. The ob-tained results show improvements in the combined classificationcompared with individual class hypothesis from the individualdetector modules.

I. INTRODUCTION

Machine perception is the process of understanding andrepresenting the environment by organizing and interpretinginformation from sensors. Intelligent vehicles applications likethe Advance Driver Assistant Systems (ADAS) help drivers toperform complex driving tasks and avoid dangerous situations.ADAS generally have three components: perception, reasoning& decision and control. We have to perceive from the sensorsin order to model the current static and dynamic environment.Then, we use the perception output to reason and decide whichactions are the best to finally perform such actions. In orderto perform a good reasoning and control we have to correctlymodel the surrounding environment [1].

Robotic perception is composed of two main tasks: simul-taneous localization and mapping (SLAM) deals with mod-eling static parts; and detection and tracking moving objects(DATMO) is responsible for modeling dynamic parts of theenvironment. In SLAM, when vehicle location and map areunknown the vehicle generates a map of the environmentwhile simultaneously localizing itself in the map given all themeasurements from its sensors. DATMO aims at detecting andtrack the moving objects surrounding the vehicle and predicttheir future behaviors. SLAM and DATMO are considered

correlated and aim at obtaining a holistic representation ofthe environment: static and moving objects [1], [2].

Once object detection and tracking is done a classificationstep is needed in order to determine which class of objectsare surrounding the vehicle. Knowledge about the class ofsurrounding moving objects can help to improve their tracking,reason about their behavior and decide what to do accordingto their nature. Moving object classification enables a betterunderstanding of driving situations. Object detectors that aimto recognize specific type of objects are considered binaryclassifiers.

Current state of the art approaches for object classificationfocus only in one class of object (e.g. pedestrians, cars, trucks,etc.) and rely on one type of sensor (active or passive) toperform such task. Including information from different typeof sensors can improve the object classification and allowthe classification of multiple class of objects [3]. Individualobject classification from specific sensors, like camera, lidar orradar, have different reliability degrees according to the sensoradvantages and drawbacks. Therefore, we use the ability ofeach sensor to compensate the deficiencies of others and henceto improve the final classification.

Fusion approaches for object classification should aim tocomplement the sensor advantages and reduce their disadvan-tages to improve the classification result [4]. It is discussedin [1], [5] that, for classification purposes, when combiningdifferent sensor inputs we must be aware about the classifica-tion precision of each sensor and take it into account to get amore accurate result.

Geronimo et al. review the current state of the art inpedestrian detection for ADAS, their work focus on camera-sensor based approaches due to the high potential of visualfeatures, spatial resolution and richness of texture and colorcues [3]. They conclude that the advantages and drawbacksof camera sensors can be complemented with active sensors,like lidar, to improve the overall performance of pedestriandetection.

Vehicle detection based only on camera sensors is verychallenging due to big intraclass differences like shape, sizeand color. Changes in illumination, object’s pose, and sur-rounded objects make difficult the appearance identification.The works proposed in [6] and [7] review different approachesfor vehicle detection, it highlights the difficult of relying onlyon camera sensors and suggest that the use of active sensorscould improve the detection performance.

Himmelsbach [8] and Azim [9] propose a method for track-ing and classifying arbitrary objects using active sensors, theirproposed methods show good results. Himmelsbach proposes abottom-up/top-down approach that relies on a 3D lidar sensorand considers object appearance and motion history in theclassification task. Its main limitation is that it needs top-down knowledge to properly perform the cloud of points clas-sification. Azim proposes a three dimensional representationof the environment to perform the moving object detectionand classification, but its results show a limited performancein cluttered environments due to the lack of discriminativeinformation.

Smets in [10] proposes an approach to joint tracking andclassification using Dempster-Shafer theory. This approachuses classical Kalman Filters for the tracking phase whilefor the classification part it uses the transferable belief model(TBM) framework. Results show that when there is no one-to-one mapping between target behaviours and classes, TBMprovides more intuitive results than a Bayesian classifier.However, this approach does not take into account multiplesources of evidence which can help to improve the trackingand classification accuracy; and moreover, relies entirely onthe targets behaviours to perform the classification withouttaking into account appearance information.

While in probability theory, evidence of a variable valuemay be placed on any element of a possible set of values. InDempster-Shafer (DS) theory, evidence cannot only be placedon elements and sets, but also on sets of sets. This meansthat the domain of DS theory is all sets of all subsets. DStheory provides tools for capturing ignorance or an inabilityto distinguish between alternatives. In probability theory, thiswould be done by assigning an equal or uniform probabilityto each alternative [11]. The use of the power set as the frameof discernment allows a richer representation of hypothesis.Using combination rules we can fuse the evidence fromdifferent sources to transfer the evidence into a final combinedresult [12].

In this paper we propose a generic fusion approach basedon DS theory. We apply this approach to the moving objectclassification problem. Given a list of detected objects anda preliminary classification from different individual detectors(or classifiers), the proposed approach combines instantaneousinformation from current environment state by applying a ruleof combination based on the one proposed in [13]. The rule ofcombination can take into account classification evidence fromdifferent sources of evidence (object detectors/classifiers), theuncertainty coming from the reliability of the sensors andthe sensor’s precision to detect certain classes of objects.The proposed approach aims to improve the individual objectclassification provided by class-specific sensor detectors. Afterinstantaneous fusion is done the proposed approach fuses itwith the combination result from previous times. Its archi-tecture allows to give more importance to the classificationevidence according to its uncertainty factors.

Using the DS theory we are able to represent the evidencecoming from different sensors into a common representation

based on prepositions, i.e. object class hypothesis. The pro-posed fusion framework relies in two main parts: the instan-taneous fusion obtained from sensor evidence at current time,and the combined evidence from previous times. Instantaneousfusion is divided in two main layers: the individual evidencelayer and the fusion layer. This features allow the proposedmethod to include several sensor inputs and different sets ofobject classes.

The main contributions of this work rely on: the definitionof a generic fusion framework based on DS theory and par-ticularly applying to the moving object classification task; theinclusion of sensors reliability and sensors precision to detectand classify certain classes of objects as main parametersto perform fusion; a conjunctive rule of combination thatallows to combine several sources of evidence using a commonrepresentation frame, it allows as well to include uncertaintyfrom the reliability and sensor classification precision, andto manage conflict situations that can lead counter-intuitiveresults.

Several experiments were done to analyze and compare theresults obtained by the proposed fusion approach against theindividual classification provided by three individual objectdetectors. We used real data from highways and urban areasobtained by a demonstrator from the interactIVe (AccidentAvoidance by Active Intervention for Intelligent Vehicles)European project 1.

This rest of this paper is organized as follows. Next sec-tion reviews some concepts of the Dempster-Shafer theory.Section III describes the vehicle demonstrator and the set ofsensors we use to test our proposed approach. In section IVwe define the proposed fusion framework used to combineclassification information. Implementation of the proposed fu-sion framework is done using the architecture define in sectionV. Experimental set-up and experimental results are shown insection VI. Finally, section VII presents the conclusions.

II. DEMPSTER-SHAFER THEORY BACKGROUND

The Dempster-Shafer theory is a generalization of theBayesian theory of subjective probability. Whereas theBayesian theory requires probabilities for each question ofinterest, DS theory allows us to base degrees of belief forone question on probabilities for a related question [12]. DStheory is highly expressive, allows to represent different levelsof ignorance, does not need prior probabilities and manageconflict situations when opposite evidence appears.

DS theory represents the world in a set of mutually ex-clusive propositions known as the frame of discernment (Ω).It uses belief functions to distribute the evidence about thepropositions over 2Ω. The distribution of mass beliefs is doneby the function m : 2Ω → [0, 1] , also known as Basic BeliefAssignment (BBA), which is described in equation 1.

DS theory allows alternative scenarios other than the singlehypotheses, such as considering equally the possible sets thathave a non-zero intersection. We can combine hypothesis

1http://www.interactive-ip.eu

in a compound set giving it a new semantic meaning, forexample the unknown hypothesis created from combining allthe individual hypothesis. Moreover BBA can supports anyproposition A ⊆ Ω without supporting any sub-proposition ofA, which allows to represent partial knowledge.

m(∅) = 0;∑A⊆Ω m(A) = 1.

(1)

Any subset A of Ω with m(A) > 0 for a particular belieffunction is called a focal element of that function.

In order to combine different sources of evidence, repre-sented as belief functions over the same frame of discernmentand with at least one focal element in common, a combinationrule is required. Several fusion operators have been proposedinto the DS framework concerning scenarios with differentrequirements. One of the widely used is that proposed byDempster [12]. Dempster’s rule of combination assumes in-dependence and reliability of both sources of evidence. Thisrule is defined as follows:

m12(A) =

∑B∩C=A

m1(B)m2(C)

1−K12; A 6= ∅

K12 =∑

B∩C=∅m1(B)m2(C)

(2)

where K12 is known as the degree of conflict. Dempster’s ruleanalyses each piece of evidence to find conflicts and uses itto normalize the masses in the set.

III. VEHICLE DEMONSTRATOR

We use the CRF vehicle demonstrator, which is part of theEuropean project interactIVe, to obtain datasets from highwayand cluttered urban scenarios. The obtained datasets wereused to test our proposed fusion framework. The demonstratoris a Lancia Delta car equipped from factory with electronicsteering systems, two ultrasonic sensors located on the side ofthe front bumper, and with a front camera located between theglass and the central rear mirror. Moreover, the demonstratorvehicle has been installed with a scanning laser and a mid-range radar on the front bumper for the detection of obstaclesahead, as depicted in figure 1. Finally, two radar sensors havebeen installed on both sides of the rear bumper to cover theside and rear areas.

IV. FUSION APPROACH FOR MOVING-OBJECTCLASSIFICATION

This work proposes an information fusion framework whichallows to incorporate in a generic way information fromdifferent sources of evidence. This fusion approach is based onDS theory and aims to gather classification information frommoving objects identified by several detector modules. Thesedetector modules can use information from different kind ofsensors. This proposed approach provides as output a fusedlist of classified objects.

Figure 2 shows the general architecture of the proposedfusion approach. The input of this method is composed ofseveral lists of detected objects and their class information,

Fig. 1. Images of the CRF demonstrator.

the reliability of the sources of evidence and the precisiondetection for certain type of classes. We assign empiricallyan evidence value to each object (set element) regardingthese last two factors. Using a proposed conjunctive rule ofcombination, we combine the classification information fromdetector modules at a current time to obtain an instantaneouscombination, later on this instantaneous class informationis fused with previous combinations in a process we calldynamic combination. The final output of the proposed methodcomprise a list of objects with combined class information.

Fig. 2. Schematic of the proposed fusion architecture.

A. Instantaneous Combination

According to Dempster-Shafer’s theory, let’s define theframe of discernment Ω and the power set of Ω as the set of allpossible class hypothesis for each source of evidence. WhereΩ represents the set of all the classes we want to identify. Let’sdefine mb as the reference mass evidence and mc as the massevidence from the source we want to combine with. Finally,ma represents the combined mass evidence.

In situations where the conflict mass is high, Dempster’scombination rule generates counter-intuitive results, for thisreason we decide to adapt the combination rule proposed byYager in [13] to obtain a most suitable rule of combinationthat will avoid this counter-intuitive results moving the conflict

mass (K) to the set Ω. This means transferring the conflictmass to the ignorance state instead of normalizing the rest ofthe masses. We do this expecting that future mass evidencewill help to solve conflict states when two unreliable sourcesof evidence classify differently one object. The used rule ofcombination is stated as follows.

mr(A) =∑

B∩C=A mb(B)mc(C); A 6= ∅

K =∑

B∩C=∅mb(B)mc(C)

mr(Ω) = m′r(Ω) + K

(3)

Where m′r(Ω) is the BBA for the ignorance state and m(Ω)includes the added ignorance from the conflict states. Thisrule considers both sources of evidence are independent andreliable.

As we cannot assure the reliability of the evidence sourcesregarding the classification due to sensor limitations or missclassifications, we proposed to use a discounting factor foreach source of evidence [14]. We believe doing this will allowus to overcome this issue.

Let’s define ma as a reliable reference source of evidenceand mb as a relative reliable source of evidence. We definerab ∈ [0, 1] as the reliability factor of mb with respect toma. To make mb reliable we apply rab over the BBA of m2.The evidence we take from the subsets of 2Ω after applyingthe reliability factor should be consider ignorance, thereforeis transfer to the set Ω

mb(A) = rab ×m′b(A);A ⊆ 2Ω, A 6= Ω

mb(Ω) = m′b(Ω) +∑

(1− rab ×m(A));forA ⊆ 2Ω, A 6= ∅, A 6= Ω

(4)

This means that we adjust the mass evidence of mb accord-ing to how reliable it is compared with the reference sourceof evidence ma. When mb is as reliable as ma (rab = 1) weget the original BBA for m′b:

mb(A) = m′b(A)mb(Ω) = m′b(Ω)

(5)

There are scenarios where one of the sources of evidence ismore precise to identify the class of an specific subset of theframe of discernment. We can include this uncertainty usinga similar approach to the one prosed above for the reliabilitybut focus in specific subsets of the frame of discernment.

Let’s fi ∈ [0, 1] be the precision factor for the ith subset(hypothesis) of a particular belief function ma. The greaterthe value the more precise is the source evidence about themass evidence assign to the subset.

ma(Ai) = m′a(Ai)× fi;Ai ⊆ 2Ω, Ai 6= ∅

ma(Ω) = m′Ω +∑

(1− fi)×m′a(Ai);forAi ⊆ 2Ω, Ai 6= ∅, Ai 6= Ω

(6)

Where m′a represents the reliable BBA. All the unallocatedevidence will be placed in the Ω state because it is consideredignorance.

Once we have applied the reliability and precision factors,the combination rule in equation 3 can be used. Several sourcescan be combined applying iteratively this rule of combinationand using the fused evidence as the reliability reference source.

The final fused evidence contains the transferred evidencefrom the different sources. The criterion we use to determinethe final hypothesis is based on the higher mass function valuefrom the combined set, though it can be modified to be basedon belief or plausibility degrees.

B. Dynamic Combination

Since we are performing the combination of differentsources of evidence at a current time t, we will call this instan-taneous fusion. One can notice that including information fromprevious combination can add valuable evidence to the currentavailable evidence. Regarding this topic, and taking advantageof the proposed general framework architecture, we introduceequation 7 as an extension of the proposed instantaneousfusion to include mass evidence from previous combinations(e.g. time t− 1).

mrt(A) = mr(A)⊗mrt−1(A) (7)

Where mr(A) represents the instantaneous fusion at time t.The operator ⊗ follows the same combination rule defined inequation 3, which is used as well to obtain the instantaneousfusion. Following this extension we can notice that the com-bined mass for the list objects from all the previous times isrepresented in mrt−1(A).

V. FRONTAL OBJECT PERCEPTION

Figure 3 shows the general frontal object perception (FOP)architecture used in the interactIVe project for the vehicledemonstrator described in section III. FOP takes raw in-formation from three different sensors to detect static andmoving objects in the surrounding environment. While lidarprocessing detects and track moving objects, pedestrian andvehicle detectors focus on regions of interest provided bylidar processing to provide more classification evidence. Thethree detector modules provided a list of moving objects andtheir preliminary classification. The objective of the proposedfusion approach defined in section IV is to take the threeclassification hypothesis provided by the three object detectors,to combine them and obtain a final classification for eachmoving object. The proposed fusion approach focus only inthe class information provided by the object detector modules.Raw data processing for objects detection and moving objecttracking is performed by the frontal object perception moduledescribed in detail in [15].

Lidar Target Detection and Tracking

Raw lidar scans and vehicle state information are processedto recognized static and moving objects, which will be main-tained for tracking purposes. We employ a grid-based fusion

Fig. 3. General architecture of the FOP module for the CRF demonstrator.

approach originally presented in [16] and which incrementallyintegrates discrete lidar scans into a local occupancy grid maprepresenting the environment surrounding the ego-vehicle. Inthis representation, the environment is discretized into a two-dimensional lattice of rectangular cells; each cell is associatedwith a measure indicating the probability that the cell isoccupied by an obstacle or not. A high value of occupancygrid indicates the cell is occupied and a low value meansthe cell is free. By using this grid-based representation noiseand sparseness of raw lidar data can be inherently handled,moreover no data association is required. We analyze each newlidar measurement to determine if static or moving objects arepresent.

Fig. 4. Occupancy grid representation obtained by processing raw lidar data.From left to right: Reference image from camera; occupancy grid obtainedby combining all raw lidar measurements; local occupancy grid from latestlidar measurements; and identification of static and dynamic objects (greenbounding boxes).

By projecting the impact points from latest laser measure-ments onto the local grid, each impact points is classifiedas static or dynamic. The points observed in free space areclassified as dynamic whereas the rest are classified as static.Using a clustering process we identify clouds of points thatcould describe moving objects. Then the list of possiblemoving objects is passed to the tracking module. Figure 4shows an example of the evolution of the moving objectdetection process using an occupancy grid representation.

We represent the moving objects of interest by simplegeometric models, i.e.: rectangle for vehicles and bicycles,small circle for pedestrians. Considering simultaneously thedetection and tracking of moving objects as a batch optimiza-tion problem over a sliding window of a fix number of dataframes, we follow the work proposed by Vu et al. [17]. Itinterprets the laser measurement sequence by all the possiblehypotheses of moving object trajectories over a sliding windowof time. Generated object hypotheses are then put into a top-down process (a global view) taking into account all object

dynamics model, sensor model and visibility constraints. Thedata-driven Markov chain Monte Carlo (DDMCMC) techniqueis used to sample the solution space effectively to find theoptimal solution.

Vehicle and Pedestrian Detectors

Regions of interest, provided by lidar processing, are takenby the vehicle and pedestrian detectors to perform the camera-based object classification. Vehicle detector uses both radarand video outputs at two different steps to perform robustvehicle detection. Radar sensors have a good range resolutionand a crude azimuth estimation and camera sensors are ableto give a precise lateral estimation while having an uncertainrange estimation. The pedestrian detector module detailed in[15] scans the image using a sliding window of fixed sizeto detect the pedestrians. For each window, visual featuresare extracted and a classifier (trained off-line) is applied todecide if an object of interest is contained inside the window.A modified version of histogram of oriented gradients (calledsparse-HOG) features, which focus on important areas of thesamples, powers the pedestrian and vehicle representationsat training and detection time. Given computed features forpositive and negative samples, we use the discrete Adaboostapproach proposed in [18] to build the vehicle and pedes-trian classifiers. Its trade-off between performance and qualitymakes it suitable for real-time requirements. The idea ofboosting-based classifiers is to combine many weak classifiersto form a powerful one where weak classifiers are onlyrequired to perform better than chance hence they can be verysimple and fast to compute.

Several images, extracted from the Daimler PedestrianBenchmark data sets and from manually label data sets, wereused as training samples to build the vehicle and pedestrianclassifiers. Despite the drawbacks of a vision-based detector,we experimentally notice that it can better identify pedestrians.

Figure 5 shows an output example from pedestrian andvehicle detectors. Pedestrian and Vehicle detector outputsconsist on a list of classified objects or region of interest.These regions were provided by the lidar processing as inputsand after performing specific class detections they determinedif the region was a vehicle (car, truck), a pedestrian, a bike oran unknown object.

Classification Information Fusion

The list of objects provided by the lidar processing, vehicleand pedestrian detector are taken as inputs for the proposedclassification fusion framework. The fusion process take intoaccount the reliability of the sensors and their precision toidentify a particular kind of objects. This fusion is done intwo steps: first it performs and instantaneous fusion usingthe individual classification from the different sensors at thecurrent time; secondly, it performs a dynamic fusion betweenthe classification information obtained from previous framesand the instantaneous fusion result. The result of the finalfusion is store over the time. This process will output a finallist of objects plus a likelihood of the class of each object.

Fig. 5. Output example from pedestrian (top) and vehicle detector (down)after processing the inputs: regions of interest.

VI. EXPERIMENTS AND RESULTS

Experiments were conducted using four short datasets pro-vided by the vehicle demonstrator described in section III. Wetested our approach in two main scenarios: urban and highway.The objective of these experiments was to verify if theresults from our proposed approach improves the preliminaryclassification results provided by the individual object detectormodules.

First of all we need to define the frame of discernmentΩ = car, truck, pedestrian, bike and therefore the set ofall possible 2Ω classification hypothesis for each source ofevidence, i.e. object detector output.

Following the fusion scheme presented in figure 2 and thegeneral architecture from figure 3 we processed the individuallist of preliminary classified objects provided by three differentdetector modules to obtain three lists of BBAs. Each individuallist of objects contains either static or moving objects. Theproposed fusion approach will focus on the moving objects.BBAs where defined empirically after analyzing the resultsfrom individual object detectors on datasets with none, one orseveral objects of interest. The reliability and precision factorswhere chosen according to the performance of each sensorprocessing on datasets from real driving scenarios.

Lidar target detector is able to identify all classes of objectsusing the cloud of points and the model-based approach. Ithas a good performance for identifying cars and trucks but apoor performance when it comes to pedestrians or bikes. Werepresent this behavior by setting individual precision factorsas is shown in equation 6. While the precision factor is highfor cars and trucks it is low for pedestrians and bikes. Theuncertainty of the object class, for lidar detector, decreaseswhen more information (frames) are available. This means,for example, that when the current lidar data indicates that amoving object is car this can be either a car or a truck, andvice-versa. When a car or truck is detected we set a high mass

value in the respective individual set carortruck and wesplit the remaining evidence into the car, truck and Ω set.If a pedestrian or bike is detected, we perform the same massassignment process, but according to the individual precisionfactors it will decrease the mass value in the individual setspedestrian and bike.

For each region of interest provided by lidar target detector,vehicle detector identifies a car, a truck or none of them. Onecan notice that some parts of vehicles are very similar, forexample the rear part of the car or truck. For this reason thereis uncertainty in the vehicle detection result. When a car isdetected we put most of the evidence in the hypothesis car,the rest of the evidence is placed in the hypothesis that saysthe object could be a car or truck and the ignorance set Ω. Weuse the same evidence assignation when a truck is detected.If no vehicle is detected in the region of interest, we put allthe evidence in the ignorance set Ω.

Pedestrian detector’s belief assignment is done in a similarway as with vehicle detector. When a pedestrian is detectedwe put high mass value in the hypothesis pedestrian and therest of evidence divided into the set pedestrian, bike andthe ignorance set Ω.

Each time we perform a combination between two BBAwe have to define reliability factors relative to the currentreference evidence source. Firstly, we set the lidar BBA asthe reference evidence source and combined with the vehicledetector BBAs. Secondly, we set the combined BBA as thereference evidence before combine it with the pedestriandetector BBA to obtain the instantaneous fusion. For the finalcombination step, we set the combined mass from previoustimes as the reference evidence source and fuse it with thecurrent instantaneous fusion. Individual precision factors aredefined only for the individual BBAs.

Figure 6 shows the results of the fusion approach for movingobject classification. We tested our proposed approach inseveral urban and highway scenarios. We obtained good resultsin both scenarios compared with the individual classificationinputs. We are currently conducting several test in order tohave quantitative values of the improvements achieved.

Figure 6 (a) shows how the proposed approach identifiesthe class of the two moving objects present in the scene: a carand a bike. In the contrary, in figure 6 (b) one pedestrian ismissing because none of the object detector modules providedevidence to support its class.

The car in figure 6 (b) and the truck in figure 6 (c)are not classified by the lidar processing because they havejust appeared few frames before in the field of view, butusing the evidence derived from the lack of classification andfrom the vehicle detector the proposed approach can correctlyidentify both at early time. Mass evidence supporting this twoclassification hypothesis becomes higher in posterior frameswhen lidar processing provides evidence about the correctclass of objects.

Figure 6 (d) shows how despite of the lack of texture in theimage to identify the two vehicles in the front, evidence fromthe lidar processing helps to correctly identify them.

(a) (c)

(b) (d)

Fig. 6. Results from the proposed fusion approach for moving object classification in (a,b) urban and (c,d) highway scenarios. Left side of each imagerepresents the top view representation of the scene (static and moving objects) showed in the right-side image. Bounding shapes and class tags in the rightside of each image represent classified moving objects.

TABLE INUMBER OF VEHICLE/TRUCK MISS-CLASSIFICATIONS FROM INDIVIDUAL

MOVING OBJECT DETECTORS AND FROM THE PROPOSED FUSIONAPPROACH.

Dataset Lidar process-ing

Vehicle detec-tor

Fusionapproach

highway 1 9 10 4highway 2 8 12 6urban 1 15 19 10urban 2 18 23 12

TABLE IINUMBER OF PEDESTRIAN/BIKES MISS-CLASSIFICATIONS FROM

INDIVIDUAL MOVING OBJECT DETECTORS AND FROM THE PROPOSEDFUSION APPROACH. HIGHWAY DATASETS DO NOT CONTAIN ANY

PEDESTRIAN OR BIKE.

Dataset Lidar process-ing

Pedestrian de-tector

Fusionapproach

urban 1 10 8 5urban 2 13 7 3

Tables I and II show a comparison between the resultsobtained by the proposed fusion approach and the individualsources of evidence for moving object classification regardingthe total number of object miss-classifications per dataset.We used four short datasets (about 1 minute long each one)provided by the vehicle demonstrator described in section III,two of them from highway scenarios and two from urban sce-narios. We can see that the proposed fusion approach reduce

the number of miss-classifications in all the datasets. Due tothe specific nature of the pedestrian and vehicle detectors wedivided the experiments to analyse separately the improvementin pedestrian/bike and car/truck classifications. One can seehow different are the individual performances of the evidencesources to detect specific class objects, this is highly relatedto the nature of the sensor information they use to provideclass hypotheses. The proposed fusion approach combines theclass evidence provide by each source of evidence among itsreliability and precision factors to obtain a better classificationof the moving objects.

VII. CONCLUSIONS

In this work we presented a generic fusion framework basedon DS theory to combine the class information of lists movingobjects provided by different object detectors. The architectureof the proposed approach allows to include several sources ofevidence as inputs despite their nature. Given the performanceof the object detectors varies according to the type of sensorused and their specifications, we included two factors in themass belief assignment: reliability of the sources and theirprecision to classify certain type of objects. Finally, we useda rule of combination to fuse several sources of evidence andmanage situations with high levels of conflict without gettingcounter-intuitive results. These situations appear very oftenwhen different kind of sensors are used and when the ego-vehicle is placed in cluttered scenarios. Finally, the proposedfusion approach is able to fuse information from previous

combinations with the combined class information at currenttime (instantaneous fusion).

We used several datasets from urban and highway scenariosto test our proposed approach. These experiments showedimprovements of the individual moving objects classificationprovided by lidar, pedestrian and vehicle object detectors. It isimportant to mention that these are preliminary results sincethe interactIVe project is still in its evaluation phase. Weare currently developing an standard ground-truth dataset toevaluate the whole perception platform and therefore conductexperiments to obtain more quantitative results to support theimprovements of the proposed classification fusion approachpresented in this paper.

ACKNOWLEDGMENT

This work was also supported by the European Commissionunder interactIVe, a large scale integrating project part of theFP7-ICT for Safety and Energy Efficiency in Mobility. Theauthors would like to thank all partners within interactIVe fortheir cooperation and valuable contribution.

REFERENCES

[1] T.-D. VU, “Vehicle perception : Localization , mapping with detection, classification and tracking of moving objects,” Ph.D. dissertation, LIG- INRIA, 2009.

[2] C. Wang, C. Thorpe, S. Thrun, M. Hebert, and H. Durrant-Whyte,“Simultaneous localization, mapping and moving object tracking,” TheInternational Journal of Robotics Research, vol. 26, no. 9, pp. 889–916,2007.

[3] D. Geronimo, A. M. Lopez, A. D. Sappa, and T. Graf, “Survey ofpedestrian detection for advanced driver assistance systems,” IEEEtransactions on pattern analysis and machine intelligence, vol. 32, no. 7,pp. 1239–58, July 2010.

[4] M. Darms, P. Rybski, and C. Urmson, “Classification and tracking ofdynamic objects with multiple sensors for autonomous driving in urbanenvironments,” 2008 IEEE Intelligent Vehicles Symposium, pp. 1197–1202, June 2008.

[5] Q. Baig, “Multisensor data fusion for detection and tracking of mov-ing objects from a dynamic autonomous vehicle,” Ph.D. dissertation,University of Grenoble1, 2012.

[6] Z. Sun, G. Bebis, and R. Miller, “On-road vehicle detection using opticalsensors: a review,” Proceedings. The 7th International IEEE Conferenceon Intelligent Transportation Systems (IEEE Cat. No.04TH8749), pp.585–590, 2004.

[7] M. Rohrbach, M. Enzweiler, and D. Gavrila, “High-level fusion ofdepth and intensity for pedestrian classification,” in DAGM-Symposium,J. Denzler, G. Notni, and H. SuBe, Eds. Springer, 2009, pp. 101–110.

[8] M. Himmelsbach and H. Wuensche, “Tracking and classification ofarbitrary objects with bottom-up / top-down detection,” in IntelligentVehicles Symposium (IV), 2012, pp. 577– 582.

[9] A. Azim and O. Aycard, “Detection, classification and tracking ofmoving objects in a 3d environment,” 2012 IEEE Intelligent VehiclesSymposium, pp. 802–807, Jun. 2012.

[10] P. Smets and B. Ristic, “Kalman filter and joint tracking and classi-fication based on belief functions in the tbm framework,” InformationFusion, vol. 8, no. 1, pp. 16 – 27, 2007.

[11] Q. Baig, O. Aycard, T. D. Vu, and T. Fraichard, “Fusion between laserand stereo vision data for moving objects tracking in intersection likescenario,” in IEEE Intelligent Vehicles Symp., Baden-Baden, Allemagne,Jun. 2011.

[12] P. Smets, “The transferable belief model for belief representation,” InGabbay and Smets, vol. 6156, no. Drums Ii, pp. 1–24, 1999.

[13] R. R. Yager, “On the dempster-shafer framework and new combinationrules,” Information Sciences, vol. 41, no. 2, pp. 93 – 137, 1987.

[14] P. Smets, “Data fusion in the transferable belief model,” in InformationFusion, 2000. FUSION 2000. Proceedings of the Third InternationalConference on, vol. 1, July 2000, pp. PS21–PS33 vol.1.

[15] O. Chavez-Garcia, J. Burlet, T.-D. Vu, and O. Aycard, “Frontal objectperception using radar and mono-vision,” in 2012 IEEE Intelligent Ve-hicles Symposium (IV). Alcala de Henares, Espagne: IEEE ConferencePublications, 2012, pp. 159–164, poster.

[16] A. Elfes, “Using occupancy grids for mobile robot perception andnavigation,” Computer, vol. 22, pp. 46–57, June 1989.

[17] T.-D. Vu and O. Aycard, “Laser-based detection and tracking movingobjects using data-driven markov chain monte carlo,” in ICRA, 2009,pp. 3800–3806.

[18] R. Schapire and Y. Singer, “Improved boosting algorithms usingconfidence-rated predictions,” Machine learning, 1999.

Documents

Fusion Framework for Moving-Object Classification