85
VIDEO-BASED ROAD TRAFFIC MONITORING A DISSERTATION SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF MASTER OF S CIENCE IN THE FACULTY OF ENGINEERING AND P HYSICAL S CIENCES 2013 By LI LI School of Computer Science

VIDEO-BASED ROAD TRAFFIC MONITORING · ing and classification, used for building a video-based road traffic monitoring (VRTM) system. The system is able to monitor the traffic

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • VIDEO-BASED ROAD TRAFFICMONITORING

    A DISSERTATION SUBMITTED TO THE UNIVERSITY OF MANCHESTERFOR THE DEGREE OF MASTER OF SCIENCE

    IN THE FACULTY OF ENGINEERING AND PHYSICAL SCIENCES

    2013

    ByLI LI

    School of Computer Science

  • Contents

    Abstract 8

    Declaration 9

    Copyright 10

    Acknowledgements 11

    1 Introduction 121.1 Aim and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.2 VRTM System Overview . . . . . . . . . . . . . . . . . . . . . . . . 141.3 Scope and Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2 Background Research 182.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 Vehicle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.2.1 Frame Differential Method . . . . . . . . . . . . . . . . . . . 202.2.2 Optical Flow Field Method . . . . . . . . . . . . . . . . . . . 212.2.3 Background Subtraction . . . . . . . . . . . . . . . . . . . . 212.2.4 Scan-Line Based Method . . . . . . . . . . . . . . . . . . . . 25

    2.3 Occlusion Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3.1 Feature Model . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.2 Reasoning Model . . . . . . . . . . . . . . . . . . . . . . . . 272.3.3 3D Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    2.4 Vehicle Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.1 Template matching . . . . . . . . . . . . . . . . . . . . . . . 28

    2

  • 2.4.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 292.5 Vehicle Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    2.5.1 Feature Selection based on prior knowledge . . . . . . . . . . 312.5.2 Scale-invariant feature transform (SIFT) . . . . . . . . . . . . 312.5.3 Bag-of-Feature Model . . . . . . . . . . . . . . . . . . . . . 33

    2.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    3 Vehicle Detection 363.1 Nonparametric Background Generation . . . . . . . . . . . . . . . . 363.2 Vehicle Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    3.2.1 Morphological Processing . . . . . . . . . . . . . . . . . . . 393.2.2 Blob Searching . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.3 Occlusion Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3.1 Occlusion Detection . . . . . . . . . . . . . . . . . . . . . . 413.3.2 Occlusion Handling . . . . . . . . . . . . . . . . . . . . . . 43

    3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    4 Vehicle Tracking 464.1 Configuration of Kalman Filter . . . . . . . . . . . . . . . . . . . . . 464.2 Region-based vehicle tracking algorithm . . . . . . . . . . . . . . . . 484.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    5 Vehicle Classification 515.1 Edge-based Feature Extraction . . . . . . . . . . . . . . . . . . . . . 51

    5.1.1 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . 525.1.2 Descriptor and Feature Extraction . . . . . . . . . . . . . . . 565.1.3 Feature Generation . . . . . . . . . . . . . . . . . . . . . . . 56

    5.2 Vehicle Type Recognition . . . . . . . . . . . . . . . . . . . . . . . . 605.2.1 Feature Pool Generation . . . . . . . . . . . . . . . . . . . . 605.2.2 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    6 System Design and Experiments 636.1 Image Processing Library . . . . . . . . . . . . . . . . . . . . . . . . 63

    6.1.1 OpenCV Library . . . . . . . . . . . . . . . . . . . . . . . . 636.1.2 C# Implementation . . . . . . . . . . . . . . . . . . . . . . . 64

    6.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    3

  • 6.2.2 Graphic User Interface . . . . . . . . . . . . . . . . . . . . . 656.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    6.3.1 Traffic Data Sets . . . . . . . . . . . . . . . . . . . . . . . . 686.3.2 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . 696.3.3 Evaluation Result . . . . . . . . . . . . . . . . . . . . . . . . 70

    7 Conclusion and Future Work 767.1 Project Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    7.3.1 Shadow Removal . . . . . . . . . . . . . . . . . . . . . . . . 777.3.2 Classification Model . . . . . . . . . . . . . . . . . . . . . . 787.3.3 Evaluation Improvement . . . . . . . . . . . . . . . . . . . . 78

    Bibliography 79

    A Traffic Datasets 84

    Word Count: 14,856

    4

  • List of Tables

    6.1 Definition of different levels of traffic flow . . . . . . . . . . . . . . . 686.2 Background training time for different traffic scenes . . . . . . . . . . 716.3 Comparison of modified occlusion removal method and the original one 726.4 Performance of vehicle detection module under different traffic flows . 726.5 Performance of vehicle detection module under different weather con-

    ditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.6 Performance of vehicle tracking module under different traffic flows . 736.7 Confusion matrices produced by edge-based shape model . . . . . . . 746.8 Confusion matrices produced by Bag of Feature model . . . . . . . . 74

    5

  • List of Figures

    1.1 Function diagram of VRTM System . . . . . . . . . . . . . . . . . . 14

    2.1 Various challenging traffic situations . . . . . . . . . . . . . . . . . . 192.2 Various types of vehicle occlusion . . . . . . . . . . . . . . . . . . . 262.3 Generation of Keypoint descriptor [Low99a] . . . . . . . . . . . . . . 332.4 Generation of Codeword [Sch11] . . . . . . . . . . . . . . . . . . . . 342.5 Histograms of testing images [SL12] . . . . . . . . . . . . . . . . . . 34

    3.1 Algorithm of Background Generation Method [Liu06] . . . . . . . . 383.2 Background Generation Results . . . . . . . . . . . . . . . . . . . . 383.3 Flowchart of Vehicle Segmentation Process . . . . . . . . . . . . . . 393.4 Result of 6×6 regional searching algorithm . . . . . . . . . . . . . . 403.5 Convex hulls of occluded and unoccluded vehicle . . . . . . . . . . . 423.6 Possible cutting points and possible cutting lines . . . . . . . . . . . . 443.7 Vehicle detection result . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.1 Flowchart of region-based vehicle tracking algorithm . . . . . . . . . 474.2 Overall design of tracking control list [Qia09] . . . . . . . . . . . . . 494.3 Vehicle Tracking Results . . . . . . . . . . . . . . . . . . . . . . . . 50

    5.1 Flowchart of vehicle classification algorithm . . . . . . . . . . . . . . 525.2 Two kernels of Sobel operator . . . . . . . . . . . . . . . . . . . . . 535.3 Edge images generated by Sobel and Canny operator . . . . . . . . . 555.4 Results of edge detection . . . . . . . . . . . . . . . . . . . . . . . . 575.5 Descriptor examples. The horizontal axis shows the numbers of each

    bin, and vertical axis presents the proportions of each bin’s gradientmagnitude. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    5.6 Some examples of edge-based features and DoG features . . . . . . . 59

    6

  • 6.1 UML diagram of image processing library . . . . . . . . . . . . . . . 646.2 Flow Chart of VRTM System . . . . . . . . . . . . . . . . . . . . . . 666.3 Graphic User Interface of Background Generation Module . . . . . . 676.4 Graphic User Interface of Vehicle Tracking . . . . . . . . . . . . . . 676.5 Graphic User Interface of Vehicle Classification . . . . . . . . . . . . 686.6 Vehicle detection result by deploying different minimal differentials . 706.7 Classification result by deploying different combination of threshold

    T hra and SIFT size ratio . . . . . . . . . . . . . . . . . . . . . . . . 716.8 Performance Comparison to Bag of Feature model. X-axis is the num-

    ber of training sample. Y-axis is the average error rate. . . . . . . . . 75

    A.1 One-way Street . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84A.2 High Way1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84A.3 High Way2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85A.4 Cross Road . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85A.5 Parking Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    7

  • Abstract

    The rapid growth of computer vision techniques has turned video-based traffic mon-itoring into an excellent alternative of traditional monitoring measures. At present,vehicle detection, tracking and classification are three main tasks for most road mon-itoring systems. However, there is less research focused on the system that integratesall these functionalities.

    This dissertation presents some feasible methods, related to vehicle detection, track-ing and classification, used for building a video-based road traffic monitoring (VRTM)system. The system is able to monitor the traffic flow via the video shot from a station-ary camera, and output the vehicle trajectories, count and type. In addition, in orderto build a robust and effective system, numerous methods proposed in other literatureshave been reviewed and reasonably extended.

    The object detection module is based on a most reliable background mode (MRBM)which is able to subtract an idealized background image under changeable circum-stances and systematic noise. As for blob tracking, the proposed method is based onthe theory of Kalman Filter which estimates the current state of blob by giving thecurrent input measurements and the previous state. Regarding the classification, weproposed an edge-based feature model which is based on the Scale-invariant featuretransform (SIFT) descriptor.

    Our prototype has been tested by ten video clips. It can achieve an accuracy rate of96% for vehicle tracking by generating a robust background image. In terms of vehicleclassification, it has a true positive rate of 91.5% and a false positive rate of 1.3%.

    8

  • Declaration

    No portion of the work referred to in this dissertation hasbeen submitted in support of an application for another de-gree or qualification of this or any other university or otherinstitute of learning.

    9

  • Copyright

    i. The author of this thesis (including any appendices and/or schedules to this the-sis) owns certain copyright or related rights in it (the “Copyright”) and s/he hasgiven The University of Manchester certain rights to use such Copyright, includ-ing for administrative purposes.

    ii. Copies of this thesis, either in full or in extracts and whether in hard or electroniccopy, may be made only in accordance with the Copyright, Designs and PatentsAct 1988 (as amended) and regulations issued under it or, where appropriate,in accordance with licensing agreements which the University has from time totime. This page must form part of any such copies made.

    iii. The ownership of certain Copyright, patents, designs, trade marks and other in-tellectual property (the “Intellectual Property”) and any reproductions of copy-right works in the thesis, for example graphs and tables (“Reproductions”), whichmay be described in this thesis, may not be owned by the author and may beowned by third parties. Such Intellectual Property and Reproductions cannotand must not be made available for use without the prior written permission ofthe owner(s) of the relevant Intellectual Property and/or Reproductions.

    iv. Further information on the conditions under which disclosure, publication andcommercialisation of this thesis, the Copyright and any Intellectual Propertyand/or Reproductions described in it may take place is available in the Univer-sity IP Policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any relevant Thesis restriction declarations deposited in the Uni-versity Library, The University Library’s regulations (see http://www.manchester.ac.uk/library/aboutus/regulations) and in The University’s policy on pre-sentation of Theses

    10

  • Acknowledgements

    The writing of this dissertation has been one of the difficulties I have ever had to face.So I would like to express my deepest appreciation to the people who support me dur-ing this period. Firstly, my supervisor Dr Aphrodite Galata, is very kind, wisdom andresponsible. She always conveys a spirit of positive attitudes in regard to the research,and an enthusiasm of teaching. Every time I felt depressed with my dissertation shewould encourage me and keep me carrying on. Without her guidance and encourage-ment, this dissertation would not have been possible.

    I would also like to take this opportunity to thank my parents. They are always mysupports in my whole life. Thanks for their trusts and encouragement.

    11

  • Chapter 1

    Introduction

    Nowadays, with the growth of motorization and urbanisation, traffic congestion andtraffic violation have become two urgent problems in transport and traffic managementas they increase the burden of governments and local authorities to a great extent. Gen-erally speaking, a good solution for solving these problems is to emphasise infrastruc-ture construction, which may enhance the traffic capability for road network. However,with the restriction of limited space, money and other resources, people are now fo-cusing on improving the operational efficiency by employing various high-technologysystems, like Intelligent Transportation System (ITS) [WP00].

    ITS is a worldwide research hotspot in the domain of transportation analysis. Itcontains advanced computer process technology, information technology and elec-tronic control technology. These advanced technologies aim to provide innovativeservices to different modes of transportation and traffic management, like building areal-time, accurate and high-efficient integrated traffic management system [PoJ10].Typically, traffic data can be collected via radar, magnetic sensor, LIDAR or video-cameras [JA12]. Compared to approaches based on other data sources, the video-basedmethod has its own special advantages that are shown as follows:

    • Detect multiple vehicles on different lanes at the same time.

    • Compare with other methods, stationary camera has a more extensive field ofvision.

    • Low cost on installation and maintenance, installation and maintenance withouttransportation interruption [KW11].

    • Obtain traffic information, include traffic volume, vehicle speed and vehicle

    12

  • CHAPTER 1. INTRODUCTION 13

    types, synchronously.

    • Easy to store, retrieve and maintain traffic data through the Internet and intranet.

    The reminder of this chapter is organized as follows: aims and objectives of thisproject are presented in section 1.1, followed by a definition of VRTM system’s overallstructure in section 1.2. As for section 1.3 and 1.4, we describe the scopes andcontributions of this project in detail. Finally, an outline of this dissertation is listed insection 1.5.

    1.1 Aim and Objectives

    The aim of this project is to develop a Video-based Road Traffic Monitoring (VRTM)system which is able to detect vehicles, gather traffic information and classify vehicletypes by analysing the traffic video.

    Objectives of our project are shown as follows:

    • Identify a robust background generation method that is able to produce highquality background images by overcoming the interference of weather varia-tions, illumination change and stationary objects.

    • Improve the background subtraction algorithm in order to obtain more completevehicle blobs and overcome blobs overlapping problem.

    • Deploy Kalman Filter [Del98] and Region-based matching [Qia09] algorithm totrack the vehicle blobs and obtain the count and trajectories of vehicles.

    • A classification model is able to classify the vehicles into three types which arecar, van and lorry. In addition, the accuracy and true positive rate are expectedto be more than 95%.

    • The vehicle tracking and classification component should satisfy the real-timerequirement by using low computational complexity algorithms [CC95].

    • Users is able to gather the detailed traffic information and certain vehicle’s tra-jectory through interacting with the interface.

  • CHAPTER 1. INTRODUCTION 14

    1.2 VRTM System Overview

    Since the quality and comprehensiveness of traffic data is crucial to ITS, lots of im-portant traffic information, such as vehicle trajectories, number of vehicles passed theroad, average velocity, vehicle types, lane occupancy ratio, occurrence of accidents etc.[HZ04], should be collected by the traffic surveillance system effectively. To achievethis, the VRTM system is composed of three component: object detecting component,blob tracking component and vehicle classification component. The object detectioncomponent is responsible for segmenting any foreground objects of interest from thebackground. The tracking component is to match the same image blob appeared ina sequence of frames. And the classification component is to identify the type ofsegmented vehicle according to its unique features. The detailed functionalities ofVRTM are shown in figure 1.1. These three modules can be further divided into six

    Figure 1.1: Function diagram of VRTM System

    parts, which are background generation, vehicle segmentation, shadow removal, vehi-cle tracking, feature selection and feature matching. The detailed algorithm flow chartis shown in chapter 6.

    1.3 Scope and Limitation

    This project relies on some grounded vision-based theories which have been proposedby others and verified by our experiments. In addition, the scope of this project islimited to the capability of program that implements these methodologies.

    Traffic video acquisition problems and hardware implementation issues are notconsidered in this project. Some basic requirements of testing videos are shown asfollows:

    • Traffic videos are supposed to have a finer resolution and be longer than five

  • CHAPTER 1. INTRODUCTION 15

    minutes in length, thus the system can obtain plenty samples for building a strongclassification model.

    • Traffic video should be shot by a stationary camera without frequent shaking.

    • Any traffic scenes with insufficient light are not considered in our project.

    As regards to vehicle tracking, only visible vehicles are supposed to be tracked.Unobservable occlusion issues will not be addressed in this project. However, we willgive a brief introduction of their feasible solutions in Chapter 2.

    As for vehicle classification, due to the diversity of vehicle’s shape and size, wefocus on identifying car, van and lorry with a fixed view angle. The identificationof other objects, as well as other techniques of vehicle classification, will be brieflydescribed in Chapter 2 and Chapter 6.

    1.4 Contribution

    This project illustrates how computer vision-based methodologies can be applied inroad monitoring system. In order to achieve this, we investigate the capabilities ofbackground extraction, object tracking and feature extraction and modelling algorithmson processing diverse traffic data. More specific contributions of this project are shownas follows:

    • Studying a wide context of the project, including background knowledge, method-ologies and other existing works related to the areas of vehicle detection, track-ing and classification.

    • As for vehicle detection, we compare the characteristics among different back-ground subtraction methods, frame differential method, etc. and adopt a mean-shift based background subtraction method to adapt the changeable traffic scenes.We also propose a novel method for blob segmentation which can greatly sup-press the image noise.

    • Regarding vehicle tracking, we introduce a region-based tracking method asso-ciated with Kalman Filter, which helps us to obtain vehicle count in a real-timesituation.

  • CHAPTER 1. INTRODUCTION 16

    • In terms of vehicle classification, we compare the performance of edge-basedmodel proposed by Ma and Grimson [XM] and bag-of-words [Sch11]. The eval-uation result shows a promising result on the former one.

    • Developing a system prototype integrates all the techniques, and enhancing itbased on prior knowledge in traffic area and actual experimental results.

    • As for the experimental results of vehicle count, the average accuracy rates ob-tained from ten traffic video clips achieve 96%. And the true positive rate forvehicle classification experiment achieves 91.3%. For more details, see Chapter6.

    In summary, this project illustrates some satisfactory approaches for gathering trafficdata. These experimental results also prove that the video-based technique is a promis-ing method for road monitoring and transportation analysis.

    1.5 Dissertation Overview

    The structure of this dissertation is organized as follows:

    • Chapter 2 Background Research: This chapter reviews some background in-formation related to each component of VRTM system. It describes inherentprinciples of vision-based techniques that are commonly used in image process-ing area. In order to clarify their practicalities for our project, merits and de-merits of each technique are also listed. The information given in this chapter isessential to the future understanding and design of VRTM system.

    • Chapter 3 Vehicle Detection: This chapter specifies the detailed methodologiesand algorithms used for implementing background generation module and vehi-cle segmentation module. It shows how we extend the existing techniques anddevelop our own algorithms.

    • Chapter 4 Vehicle Tracking: This chapter describes algorithms applied in thesecond step of the whole traffic monitoring process. The tracking methods canbe divided into two parts, which are target prediction and target matching.

    • Chapter 5 Vehicle Classification: Feature generation and object modelling aretwo main steps in the classification process. This chapter presents a promisingway for building a strong classifier model and achieving a high accuracy rate.

  • CHAPTER 1. INTRODUCTION 17

    • Chapter 6 System Design and Experiments: This is a central chapter whichpresents the way we build VRTM system and the experimental results obtainedfrom the system. In the first section, we introduce the adoptive program library,workflows of each component and GUI design of the system. The second sectionillustrates the parameter selection and the experimental results.

    • Chapter 7 Conclusion and Future Work: The achievements and limitation ofthe project are summarised in this chapter. In addition, some suggestions for thefuture work are proposed for solving existing defects.

  • Chapter 2

    Background Research

    In this chapter, we describe some inevitable challenges for implementing the system.To solve these problems, we discuss some commonly used theories and algorithms re-lated to each module from a wider perspective. Specifically, we illustrate respectivecharacteristics of those background theories and summarize the advantages and short-comings of each of them. Therefore, we may select techniques that are suitable for thesystem and further improve them.

    2.1 Challenges

    Some major challenges for implementing VRTM system are listed below:

    • The background model can be largely affected by the change of external environ-ment and system noise, such as illumination change, stationary object, camerashakes, etc. Figure 2.1(a) shows a distinct luminance difference among differentareas. As a result of the change of sunlight, the background image may lose itsefficacy within a short period.

    • Vehicle occlusions are prevalent in most observation angles and are one of themost difficult factors to overcome. Occlusions usually occurs when one vehicleappears next to another and obscures it partially or completely [MW08], whichresults in miscalculation on vehicle numbers as the system takes two occludedvehicles as one object. Figure 2.1(b) shows a typical occlusion problem whichposes a great threat to the blob-based object detection method.

    • Vehicle may partially appear in the binary image as some pixels may have similargrey value as the ones in background image. The image hollow may result in

    18

  • CHAPTER 2. BACKGROUND RESEARCH 19

    (a) Illumination variance (b) Vehicle occlusion

    (c) Specular reflection (d) Swerve

    Figure 2.1: Various challenging traffic situations

    the failures on vehicle segmentation and tracking. Figure 2.1(c) illustrates avehicle whose partial exterior color is similar to the one of the road due to alarge specular reflection.

    • When the vehicle has a substantial change on its velocity or moving direction,the predictive tracking algorithm is easy to generate a misleading result. Figure2.1(d) illustrates this situation where a white lorry is wheeling.

    • Since the vehicles of same type may have various shapes and colors, it is difficultfor the system to classify each of them.

    2.2 Vehicle Detection

    The main purpose of vehicle detection is to extract the vehicle image from a trafficvideo and to remove the interferences as much as possible. Owing to the existence ofinterference factors in real situation, such as the lighting change, interference of mov-ing background objects, vehicle’s shadow, vibration and occlusion of other vehicles,the accuracy and robustness of vehicle detection algorithm can be impacted to a great

  • CHAPTER 2. BACKGROUND RESEARCH 20

    extent. Therefore, we have to take these situations into account when we are buildingthe VRTM system.

    Among all the procedures in traffic data collection, vehicle detection is the keyproblem since an accurate segmentation of moving object will greatly enhance the per-formance of vehicle tracking and classification. Currently, there are lots of methodshave been adopted for moving object detection, like optical flow [BR78], HSV colourbackground modelling [ZL03] and background subtraction based on time difference[Liu06]. Even though many effective methods have been provided in the literature[YS08, MW09], the fundamental problems for precision of detecting vehicles are stillnot being completely solved. In the following sections, we describe three commonlyused vehicle detection methods, which are Frame Differential Method [Sek00], Opti-cal Flow Field Method [Gib50], Background Subtraction [Liu06] and Detection LineMethod [T84].

    2.2.1 Frame Differential Method

    The theory of Frame Differential Method is based on the close relationship among asequence of motion images [Sek00]. The algorithm is shown as follow: firstly, wesubtract the grey value of the k frame from the grey value of the k-1 frame in the videosequence; then select a threshold and transform the difference image into binary value.If the grey value of a pixel is higher than the given threshold, such pixel is treated asforeground, conversely as background pixel. The blobs selected from the foregroundpixels are considered as the candidates of actual moving vehicles. Two main equationsare shown as follow:

    Dk(x,y) = |Fk(x,y)−Fk−1(x,y)| (2.1)

    Vk(x,y) =

    1 if Dk(x,y)> t0 else (2.2)where Dk(x,y) indicates absolute value of differential between the two adjoining frames[KW11]. Fk(x,y) refers to the grey value of a pixel (x,y) in the k frame. The foregroundmask is represented by Vk(x,y) and decided via the threshold t.

    As for its advantages, the Frame Differential method is insensitive to illumination,and it is suit for dynamic environments and real-time system due to its less computa-tional complexity. Nevertheless, this algorithm also has some disadvantages. Owningto the feature of dynamic, it is difficult for the algorithm to detect stationary objects orthe objects with low speed. As for the high-speed objects, the segmentation result will

  • CHAPTER 2. BACKGROUND RESEARCH 21

    be split apart and the interior of detected blob is much likely to contain lots of noiseand hollow. An improved algorithm has been presented by Collins [CR99], whichuses multiple frames differential instead of the initial one. In spite of this, Collins alsoadopted adaptive statistical optimal binary method instead of using fixed threshold[CR99].

    2.2.2 Optical Flow Field Method

    The concept of Optical Flow Field method is brought out by Gibson in 1950 [Gib50].Kun Wu proposed that the optical flow represents the velocity of mode within an im-age [KW11]. The optical flow field is a two-dimensional instantaneous velocity fieldwhich are the project of visible three-dimensional velocity vectors on imaging plane.Optical Flow method takes image within the detecting area as a vector field of velocity;each vector represents the transient variation of the position for a pixel in the scenery.Therefore, optical flow field carries abundant information about relative motions andthree dimensional structure of the related scenery. It can be used to decide whetherthere are moving vehicles in the detecting area.

    By deploying the Optical Flow Method, we can detect the moving targets evenin a moving condition. However, this method also needs other features, like colour,grey level, edges and etc., to enhance the accuracy of segmentation, which makesit very sensitive to the system noise. Therefore, it cannot be utilized in a real-timeprocessing system if there is no special hardware support, and the inner points of alarge homogeneous object (a vehicle with single colour) cannot be featured with theoptical flow [HZ04].

    2.2.3 Background Subtraction

    Background Subtraction method [Liu06] is a classical while practical approach forvehicle detection. The principle of the algorithm is very simple: build a backgroundmodel (mean is the simplest model) for a sequence of frames to get the backgroundimage, and then subtract the background mask with the current frame. After that weselect a threshold and transform the subtracted image into binary image, which is sameas the Frame Differential Method. The pixels whose grey level is greater than thethreshold are considered as the foreground points.

    The background subtraction approaches can be typically classified as parametric

  • CHAPTER 2. BACKGROUND RESEARCH 22

    method and nonparametric method [Liu06]. In terms of parametric background gener-ation algorithm, the most commonly used method is the Gaussian mixture model. In[YS08],the algorithm classifies the Gaussian distribution into reliable distribution andunreliable distribution, and the unreliable distribution can evolve into reliable distribu-tions by accumulating pixel values. As a matter of fact, the performance of Gaussianmixture model is not steady when the objects are in a low speed or do a whistle stop. Asfor nonparametric techniques, M. Elgammal [MED00] builds a nonparametric back-ground model by kernel density estimation, which is robust enough to handle situationswhere the background is impacted by small motions, like tree branches and litter.

    Furthermore, background subtraction method is insensitive to the speed of movingobjects. Therefore, it is able to get a correct segmentation result on excessively fast andslow objects, and even on the static objects. Typically, the foreground information pro-vided by Background Subtraction method is more integrated than other methods, andit has a great value on practicability due to its less computational complexity. How-ever, this method is sensitive to illumination, weather and other extraneous factors. Asa result, it is necessary to build different background generation models and adaptiveupdating method to fit in with the surroundings. In this section, we mainly describethree commonly used background generation models, which are Average Model [Y12],Gaussian Mixture Model [PK01] and Kernel Density Estimate Model [MED00].

    Average Model

    Average method considers the average grey value of a pixel in a sequence of frames asthe background value of the same pixel [Y12]. The equation is as follow:

    Bk =1N( fK−N+1 + fK−N+2 + ...+ fk) (2.3)

    where Bk represents the updated background image; N is the number of training frames;fk is the k frame image.

    In terms of background update, as pixels of current frame can be divided into theforeground and background, we use background pixels of current frame to correct thecurrent background mask. In addition, a weight should be set as the role of updatingrate. The update of background model is denoted as follow:

    Bk+1 = (1−β)Bk +β fk (2.4)

  • CHAPTER 2. BACKGROUND RESEARCH 23

    where β is the updating rate. β with small value will make the background changeslowly as the previous background image has more proportion than the current one.Conversely, background image will be updated with a high frequency. Normally, tak-ing as a value of 0.05 can reach good update effect [KW11]. The update of backgroundshould be applied in two situations:

    1. Reduce the deviation which is brought by the light change.

    2. The background should be updated in the situation of occlusion.

    Gaussian Mixture Model

    The algorithm used in Gaussian Mixture Model [PK01] can be divided into two parts.The first part is about adaptive background modelling, which describes the backgroundpixels by using multiple Gaussian models and how to update the model by using newpixels. The second part is Expectation Maximization (EM) algorithm. EM can be seenas an improvement upon the updating strategy of background modelling.

    Within the adaptive background image, each pixel is described by K Gaussian dis-tribution models. The method that builds such models is called Gaussian Mixturemodelling. As for each pixel, its Gaussian mixture models are updated constantly overtime. We assume that there are K Gaussian model in one pixel, and the weight ofnumber k model is Wk. Therefore, in time N, the normal distribution of the model isη(XN ;θk), and the background model can be described as:

    p(XN) =K

    ∑j=1

    Wjη(XN ;θ j) (2.5)

    After we obtain the background model, each pixel within it would be converted intoa sequence of Gaussian models with weights Wk. Regarding all these backgroundmodels, we sort them by the value of Wk/θk so as to pick first B models whose valueis larger than threshold T , while other models are considered as invalid models. Thisprocedure helps us to update the background models and filter out invalid ones. B canbe expressed in a mathematical language:

    B = argmin(b

    ∑j=1

    Wj > T ) (2.6)

    As to a pixel XN in a new frame N, we traverse the sequence of background modelsto find the matching target which satisfies the condition |XN − µk| ≤ 2.5σk in the first

  • CHAPTER 2. BACKGROUND RESEARCH 24

    ranking. The updating strategy of Gaussian mixture model is as follow:

    W′k =(1−α)Wk +αp

    µ′k =(1−α)µk +ρxN

    Σ′k =(1−α)Σk +ρ(xN−µ

    ′k)(xN−µ

    ′k)

    T

    (2.7)

    Among these equations, parameter ρ equals αη(xN ;µk,Σk). As for parameter p, if thepixel matches with background model, p equals 1, otherwise, p equals 0. Parameter αis the learning rate, which controls the proportion of new pixels used in updating. Ifwe set a big value for it, the Gaussian background models would be updated at a fasterspeed, and the previous background will be eliminated more quickly. If no matchingmodels have be found, the model ranked at last will be replaced by the model whosecentre locates at XN .

    Kernel Density Estimation Method

    Elgammal proposed a nonparametric background model based on kernel density esti-mation [MED00]. This method evaluates the video sample data through kernel func-tions and selects sample data that has a maximal probability density as background.Different from Gaussian Mixture Model, Kernel Density Estimation makes the bestuse of preceding frames so as to build the background model. It is able to handle thefrequent shifts of pixels within a short time, which let the algorithm to get a moreprecise result. However, since the noise and uninteresting foreground points are alsoestimated, the algorithm is more complex than other methods.

    We assume that there are M pixels in a video frame and each pixel has N back-ground samples. In the time t, the value of pixel i is x(t)i, and the value of pixel i inj background sample is x(t)i, j, The probability of pixel i can be calculated by followequation:

    P(x(t)i) =1N

    ΣNj=1K(x(t)i− x(t)i, j) (2.8)

    where K is the kernel estimator. Assume that K obeys Gaussian distribution, we takeR, G, B components as eigenvalues. If they are independent to each other, the sum ofprobabilities of N samples can be written as:

    P(x(t)i) =1N

    ΣNj=1Πdm=1

    1√2πσi,m

    e−

    (x(t)i,m−x(t)(i,m), j)2

    2σ2i,m (2.9)

  • CHAPTER 2. BACKGROUND RESEARCH 25

    where parameter d is the feature dimension, σi,m is the kernel width of feature m. Ifthe probability satisfies P(x(t)i)< Tf , the pixel x(t)i is considered as foreground. Tf isthe global threshold under the whole image.

    2.2.4 Scan-Line Based Method

    Like induction coil sensor, the detection line should be set in an appropriate positionand is perpendicular to the vehicles’ direction. The algorithm stores all the informationon the detection line, such as background image and edges. When the object comesacross the line, the image which is in the detecting area will be overlapped. Therefore,if the length of overlapped area is larger than the threshold, we may consider this objectas a moving vehicle.

    Abramczuk et.al [T84] adopted an artificial detection line whose width is 3 pixels.While another scan-line concept adopted by Kollery et.al [KD93] is to consider whitestripes on the road as a mark to detect vehicles. After obtaining the binary edge imageof labelled region, we compare it with the referenced edge image in order to verifywhether the road marks have been overlapped. If the differential value is greater thanthe threshold, we may think the covering is a moving vehicle. In addition, this methodis able to eliminate the interference of shadow.

    Typically, Scan-Line Based Method is a simple vehicle detecting algorithm thatis suitable for real-time applications and certain traffic conditions, such as highwayand one-way roads. However, this method does not perform well in some complexcrossroads as vehicles are moving follow different directions.

    2.3 Occlusion Removal

    As regard to the type of occluded object, vehicle can be occluded by other movingobjects or background objects. Meanwhile, from a geometric point of view, object oc-clusion can be classified into partial and full occlusion [WZ08]. Figure 2.2 illustratessome instances of vehicle occlusion.

    To tackle these situations, scholars have proposed numerous methods related to oc-clusion detection and removal. Typically, these methods can be classified into feature,3D and reasoning model [WZ08]. The following subsections give a brief overview ofthese techniques.

  • CHAPTER 2. BACKGROUND RESEARCH 26

    (a) Partial Occlusion (b) Full Occlusion

    (c) Object-background Occlusion

    Figure 2.2: Various types of vehicle occlusion

    2.3.1 Feature Model

    Partial occlusion problem can be addressed by analysing reasoning features generatedfrom the remainder of the vehicle image. These features may include centroid, gra-dient, intensity variance, vehicle contour, etc. Typically, there are three main steps infeature-based object tracking algorithm, which are feature extraction, feature matchingand computing motion information.

    C.Gentile and O.Camps [CGS04] analysed the correlation between different fea-tures of partial vehicle image and tracking performance, and selected the feature whichhas the most correlation with tracking errors. Thus occlusion can be handled by track-ing segments composed by these features.

    Coifman et al. [BCM] used Kalman filter [Del98] to track vehicle’s corner pointsthroughout the video frames, which cluster the feature points as single vehicle afterthey leave the tracking area. Since Kalman filter does not require additional past infor-mation, we may obtain an easy and less computational complex model.

    Feature model is the most efficient way for tackling partial occlusion problem,and it has a broader development space than other methods. However, its algorithm

  • CHAPTER 2. BACKGROUND RESEARCH 27

    is comparatively computational intensive, and it can be easily affected by irrelevantobjects in a complex environment.

    2.3.2 Reasoning Model

    Veeraraghavan et al. [HVP] took moving blob as a basic tracking unit, which can beused to detect occlusion when two moving vehicles share one or more blobs. And thebackground occlusion is detected when a vehicle is entering or leaving a backgroundstructure.

    In [YKJH], Jung et al. proposed a method which compares the expected area andthe measured area of vehicle blobs. The area with an evident differential would betaken as either partial occlusion or full occlusion. The detailed equations are shown asfollow:

    result =

    f ull occlusion if |Ea−Ma|max(Ea,Ma) > Thigh

    partial occlusion if Tlow <|Ea−Ma|

    max(Ea,Ma)< Thigh

    no occlusion else

    (2.10)

    where Ea is the expected area of the blob, Ma is the measured area of the blob, Tlow isthe low threshold and Thigh is the high threshold [YKJH].

    In summary, reasoning model is highly dependent on vehicle’s prior knowledge,such as trajectory, position, etc. Therefore, reasoning model has a finer performanceon scenes with low traffic flow, however, it may not be robust enough for the complextraffic situation.

    2.3.3 3D Model

    Pang and Lam [CCCPY] constructed 3D models for each detected vehicle, and mea-sured the occlusion by comparing the model’s dimension, height, width and lengthwith those of detected vehicle. Vehicle’s contour is used to generate a set of curvaturepoints according to which occluded vehicles can be divided into individual vehicle.

    The 3D model-based method is computational intensive since it uses a 3D model todescribe vehicle’s detailed structure. In addition, its algorithms are established basedon the geometric constraints of traffic scene, thus it is hard to be applied in practice.

  • CHAPTER 2. BACKGROUND RESEARCH 28

    2.4 Vehicle Tracking

    The main purpose of vehicle tracking is to obtain certain traffic information, such asvelocity and traffic volume. Most vehicle tracking methods observe a fundamentalprinciple which is about using space length to judge whether two blobs in adjacentframes describe the same vehicle. This method helps us to achieve vehicle trackingwithin the time domain. Here the space length refers to not only the Euclidean distancebut also other criterions, such as the Hausdorff distance. In our system, the tracking al-gorithm should be capable to address certain kinds of situations, like vehicle occlusionand image hollow.

    Typically, vehicle tracking method can be divided into different types as the vehi-cle can be described by many ways, like model, blob, edge and features [DCl09]. Theprocedure of vehicle tracking can be seen as the process of matching targets among asuccessive frames based on vehicles’ model, area, edge or features. Recently, severaltypical mathematical tools, such as Pattern Matching and Kalman Filter, are widelyused in object tracking domain. Below are respective introductions of these technolo-gies.

    2.4.1 Template matching

    The tracking algorithm based on template matching is to find the known pattern fromthe testing vehicle images [CY96]. It can be applied into vehicle detecting and trackingfor both static image and motion image. Typically, Pattern matching method is easy toimplement but it is also computational intensive [CY96].

    We assume that the dimensions of template image T and testing image S are M×Nand L×W . Image T is the searching window which overlaps the image S, and thesearching region is called sub graph Si, j where i, j are the coordinate of top left cornerof sub graph in the image S. The ranges of i and j are: 1 ≤ i ≤ L−M+1, 1 ≤ j ≤W −N +1. We use following equation [CY96] to measure the similarity of T and Si.

    D(i, j) = ΣMm=1ΣNn=1[Si, j(m,n)−T (m,n)]

    2 (2.11)

    After expending and normalising 2.11, we may obtain 2.12:

    R(i, j) =ΣMm=1Σ

    Nn=1[Si, j(m,n)−T (m,n)]

    2

    (ΣMm=1ΣNn=1[Si, j(m,n)]

    2)12 (ΣMm=1Σ

    Nn=1T (m,n)]

    2)12

    (2.12)

  • CHAPTER 2. BACKGROUND RESEARCH 29

    According to Cauchy-Schwartz Inequality [Bit01], we may know that 0≤ R(i, j)≤ 1.If the value of R(i, j) is bigger than the threshold, then Si, j is matched with T, otherwise,it is not.

    The algorithm of Template Matching method can overcome the differential ofbrightness. However, it has a high computation complexity.

    2.4.2 Kalman Filter

    The tracking algorithm based on Kalman Filter [Del98] does not require additionalpast information, which is different from the Template Matching Method. Instead,according to the measured value of current state and previous states, object’s currentstate can be calculated through a set of recursive formulas. It is to be observed thatKalman Filter has less computational complexity and dedicated space, which makes itsuitable for a real-time system.

    In order to have a good understanding of its principle, we introduce a discretecontrol system, which can be described via a linear stochastic difference equation:

    X(k) = A×X(k−1)+B×U(k)+W (k) (2.13)

    where X(k) is system’s state and U(k) is the controlled vector. Matrix A and B are thestate transition model and control-input model. The system’s observation at time k canbe expressed as follow:

    Z(k) = H×X(k)+V (k) (2.14)

    where Z(k) is the measured value for time k. Matrix H is the observation model for thesystem. While W (k) and V (k) are the process noise and observation noise respectively.Both of them are assumed as White Gaussian Noise.

    The recursive process of Kalman filtering can be divided into two different phases,which are prediction phase and updating phase. Two important formulas that supportthese two phases are shown in 2.15 and 2.16:

    X(k | k−1) = AX(k−1 | k−1)+BU(k) (2.15)

    P(k | k−1) = AP(k−1 | k−1)A′+Q (2.16)

    2.17, 2.18 and 2.19 show three important formulas for the update phase [Del98]:

    X(k | k) = X(k | k−1)+Kg(k)(Z(k)−HX(k | k−1)) (2.17)

  • CHAPTER 2. BACKGROUND RESEARCH 30

    Kg(k) =P(k | k−1)H ′

    HP(k | k−1)H ′+R(2.18)

    P(k | k) = (1−Kg(k)H)P(k | k−1) (2.19)

    where X(k | k) refers to a posteriori state estimate at time k while given the observationstate at time k. While P(k | k) is the corresponding covariance of X(k | k). In addition,Kg is the Kalman Gain. Now we obtain the optimal estimate X(k | k) through thepreceding procedures. When the system gets into the k+1 state, P(k | k) will becomethe P(k−1 | k−1) in 2.16.

    Dellaertf [Del98] used Kalman Filter to predict searching region in the next frame,which greatly reduces the search band and diminishes the computational complexityto a great extent. If there is partial occlusion in the blob, the estimates produced byKalman filter is able to replace the optimal matching points. Therefore, the system maynot lose the vehicle even part of it has been overlapped by the interferences. Generallyspeaking, Kalman filter has a better accuracy and stability on most traffic scenarios.

    Yuan et al. [YJW06] proposed a tracking algorithm which is based on grey pre-dicting model GM(1,1). This algorithm is able to find the law of motion of the targetby updating the grey predicting model continuously. This novel method overcomesthe deficiency of Kalman Filter which needs to make an assumption on objects’ mo-tion and noise characteristic. And this method can produce a more faster and precisetracking result.

    2.5 Vehicle Classification

    There are two main steps in vehicle classification module which are feature selectionand feature matching. The main purpose of feature selection is to extract particularimage features of different types of vehicles and then pass them to the feature matchingmodule. Two typical methods of feature selection are based on a priori knowledge[RR05] and Scale-invariant feature transform (SIFT) [Low99b].

    In terms of feature matching, Support Vector Machine (SVM) [CC95] and Bag-of-Features model [Sch11] are two commonly used supervised learning algorithms for im-age classification. The Bag-of-Features model derives from Bag-of-words [LFFT09]which describes the frequencies of words from a dictionary. And SVM is considered asa strong classifier for classifying two-class or multi-class non-linear data sets. Beloware respective introductions to these technologies.

  • CHAPTER 2. BACKGROUND RESEARCH 31

    2.5.1 Feature Selection based on prior knowledge

    Roya Rad et al. [RR05] proposed a feature selection method based on a priori knowl-edge which contains the width, length, dimension and velocity of the detected vehicle.Before we start to collect the vehicle features, the system should recognize the overalldirection of moving vehicle so as to adjust the weights of features. The overall featureof vehicle can be expressed by 2.20:

    F(i) = αW (i)+βL(i)+ γD(i)+µV (i) (2.20)

    where W (i) is the width of the blob, L(i) is the length of the blob, D(i) is the dimensionand V (i) is the velocity of the detected blob. In addition, the parameter α, β, γ and µare the weights of above features. Each vehicle blob can be expressed by the value ofF(i). By clustering the training data, we may obtain some typical values of F(i) fordifferent types of vehicles.

    Since the clustering algorithm can be affected by the noise and outliers, we cal-culate the frequency of occurrence of each vehicle type and select the types with thehighest frequency of occurrence as the true vehicle types.

    This method is easy to implement and suitable for the real-time system due to itslow computational complexity. However, this algorithm only works well on certaintraffic conditions, such as highway and one-way roads, and it needs a priori knowledgeabout the direction of moving vehicles, which can be obtained through by-hand inputor vehicle tracking algorithm.

    2.5.2 Scale-invariant feature transform (SIFT)

    Lowe proposed an image feature generation method which transforms an image to alarge number of feature vectors [Low99a]. SIFT is able to solve the matching problemcaused by image planning, rotation and affine transformation. The SIFT descriptor isconstructed by the following steps:

    Firstly, we generate the scale-space by using the Gaussian kernel G(x,y,σ):

    L(x,y,σ) = G(x,y,σ)× I(x,y) (2.21)

  • CHAPTER 2. BACKGROUND RESEARCH 32

    where σ is the value of scale in the scale space. L(x,y,σ) represents the Gaussian-blurred images, which decides degree of blurring of the image after applying a Gaus-sian kernel. Lowe introduced Difference of Gaussian (DoG) so as to identify prelimi-nary keypoints of the image. The equation of DoG space is shown in 2.22:

    D(x,y,σ) = L(x,y,kσ)−L(x,y,σ) (2.22)

    The preliminary keypoint is defined as the extreme point within the DoG space, whichis the one whose pixel value is bigger or smaller than its neighbours. In the next step,since these preliminary keypoints are still changeful, we need to specify the positionand scale of keypoints more accurately. Specifically, the accurate locations of extremescan be calculated through the derivative of 2.22, which is shown as follow:

    D(x) = D+∂DT

    ∂xx+

    12

    xT∂2

    ∂x2x (2.23)

    As for the next step, the algorithm should eliminate the keypoints which have highedge responses but with lower stability. A second order Hessian matrix shown in 2.24expresses the curvatures across and along the edge.

    H =

    [Dxx DxyDxy Dyy

    ](2.24)

    The trace of H refers to Dxx+Dyy, while its determinant can be expressed as DxxDyy−D2xy. The ratio R is shown below:

    R =Tr(H)2

    Det(H)=

    (rth +1)2

    rth(2.25)

    If R is bigger than a threshold, the corresponding point should be rejected. Generally,we adopt rth as 10. After that, in order to achieve stable rotation, the algorithm may as-sign orientation parameters to each keypoint. The gradient magnitude and orientationof a point (x,y) is shown in 2.26 and 2.27:

    m(x,y) =√

    (L(x+1,y)−L(x−1,y))2 +(L(x,y+1)−L(x,y−1))2 (2.26)

    θ(x,y) = αtan2(L(x,y+1)−L(x,y−1),L(x+1,y)−L(x−1,y)) (2.27)

    where m(x,y) represents gradient magnitude, θ(x,y) indicates the orientation of point

  • CHAPTER 2. BACKGROUND RESEARCH 33

    Figure 2.3: Generation of Keypoint descriptor [Low99a]

    (x,y). An orientation histogram with 36 bins (10 degree for a bin) is built by accu-mulating orientations of all keypoints within the subregion, weighted by their gradientmagnitudes. As for the histogram graph, the peak of the histogram corresponds tothe dominant orientation. And any bins occupy 80% of the maximum bin are alsoconsidered as the auxiliary orientation [Low99a].

    In the final step, the SIFT keypoint descriptor should be generated based on loca-tion, scale and rotation of a keypoint. We take an 8× 8 neighbouring region as thesampling window and calculate histogram for 8 directions on 2× 2 neighbouring re-gions, which forms 4× 4 seed points to describe a keypoint. Therefore, each featurevector has 4×4×8 = 128 dimensions.

    2.5.3 Bag-of-Feature Model

    The bag-of-word model [SL12] is originally used in document classification, and itscodebook is the occurrence counts of words, which can be seen as the histogram ofthe vocabulary. While in the domain of computer vision, a bag of visual words is theoccurrence counts of local image features, therefore, it also can be called as Bag-of-Feature Model. The key stage of this method is as follow:

    Firstly, all descriptors produced by SIFT are collected together and ready to beprocessed. Then, K-Mean clustering algorithm is deployed to generate a codebookfor all the training images. Similar features will be clustered into a single codeword,which is shown in figure 2.4. After that, codewords are used to represent each trainingimage through calculating the number of appearance of them in the codebook. Thehistograms are shown in figure 2.5 : In the final step, according to the value of each

  • CHAPTER 2. BACKGROUND RESEARCH 34

    Figure 2.4: Generation of Codeword [Sch11]

    Figure 2.5: Histograms of testing images [SL12]

    bin in the histogram of a new testing image, we count the frequency of each codewordappears in the image, and assign the corresponding class of the most frequent codewordto the testing data.

    Bag-of-Feature model has a good performance on classifying images according tothe objects they contain [Sch11]. In spite of this, it is invariant to the position anddirection of an object which is shown in the image [Sch11]. On the other hand, BoWmodel ignores the spatial relationship among the visual words, which makes it poor atlocalizing objects within an image [SSC06].

    2.6 Chapter Summary

    In practice, some methods introduced above are not applicable for the VRTM system.Specifically, as for vehicle detection, Frame Differential method has a relatively low

  • CHAPTER 2. BACKGROUND RESEARCH 35

    accuracy when the system is analysing a complex traffic situation, and Optical FlowField method has a high requirement on system’s hardware. Therefore, BackgroundSubtraction can be a compromise. Regarding occlusion removal, since feature-basedmodel has a comparatively satisfactory result, we consider vehicle’s convex shape asan important feature. In terms of vehicle tracking, Kalman Filter is the first choice as ithas been widely used in object tracking domain. It also has been proved that KalmanFilter can meet the needs of real-time system [Qia09]. As for vehicle classification, theproposed methods needed to be evaluated through this project.

    Although the particular advantages of these methods can help to achieve the overallgoals of vehicle detection, tracking and classification, we should not apply the identicalalgorithms as our system has its own performance requirements and applicable envi-ronments. As a result, we should take their basic ideas as references and ”improve”them in order to achieve the objectives of our system.

  • Chapter 3

    Vehicle Detection

    According to the merits and demerits of the preceding methodologies related to vehicledetection, we adopt parts of them and propose new algorithms to meet the practicalneeds of the VRTM system. The flowchart of vehicle detection module is shown infigure 6.2. In addition, some primary algorithms for vehicle detection are illustratedin following sections.

    3.1 Nonparametric Background Generation

    In terms of background generation, we adopt a nonparametric method which is pro-posed by Liu [Liu06]. The basic computational model of this method is mean shift[CM02], which is an efficient way to find the modes of the underlying density wherethe gradient is zero [Liu06]. The detailed algorithm is shown as follows:

    1. Select preliminary samplersUsers should specify the number of frames used for background generation.Here we introduce equation 3.1 [Liu06] to present the value of a pixel within avideo sequence.

    S = {xi}, i = 1, · · · ,n (3.1)

    where xi is the grey level of pixel x in frame i. n is number of samples.

    2. Select representative pointsIn order to alleviate the computational intensity of the algorithm, we calculate lo-cal means of certain number of samples and denote the set of means by equation3.2 [Liu06].

    P = {pi}, i = 1, · · · ,m (3.2)

    36

  • CHAPTER 3. VEHICLE DETECTION 37

    where m�n.

    3. Apply mean shiftBy applying mean shift on representative points in P, we may obtain m conver-gence points. As some convergence points are very similar or even identical toeach other, we may cluster them together as one class. Equation 3.3 illustratesthe cluster centres for q classes [Liu06].

    C = {{ci,wi}}, i = 1, · · · ,q (3.3)

    where ci is the grey level and wi is the weight for each cluster centre. And thevalue of q is much less than the value of m. The weight wi is calculated byequation 3.4 [Liu06].

    wi =lim, i = 1, · · · ,q (3.4)

    where li is the number of points for each cluster.

    4. Obtain the most reliable background modeBackground mask is generated by selecting the ci which has the highest weightwi. The final result we got is the most reliable background mode.

    Figure 3.1 shows the overall algorithm of nonparametric method for one pixel in asequence of frames. We can see that the initial samples are selected into representativepoints, and then the points will be converged and clustered as the candidate backgroundmodes. Finally, the grey level of pixel which has the highest weight is selected as thebackground pixel.

    To dynamically adapt the illumination change and the influence of static objects,the background image should be updated periodically. All the pixels in a new framewill be allocated to the nearest cluster centre. Therefore, the weight of each clusterwill be updated, which may possibly change the value of corresponding backgroundpixel.

    In figure 3.2, we illustrate two background images generated from 400 videoframes. It can be seen that the nonparametric method has a robust performance onanalysing videos with intensive traffic flow.

  • CHAPTER 3. VEHICLE DETECTION 38

    Figure 3.1: Algorithm of Background Generation Method [Liu06]

    (a) video frame (b) background image

    (c) video frame (d) background image

    Figure 3.2: Background Generation Results

  • CHAPTER 3. VEHICLE DETECTION 39

    Figure 3.3: Flowchart of Vehicle Segmentation Process

    3.2 Vehicle Segmentation

    In the module of vehicle segmentation, we adopt the method mentioned in section2.2.3 which is about subtracting the background mask with the current video frameand transforming the foreground image into binary image. Figure 3.7(b) shows anexample of background subtraction. It can be seen that the foreground image containsmassive noise in the edge area, and there are numerous discontinuous hollows in theinterior of vehicle area. To solve this problem, we utilise several morphological opera-tions to process the grey image. In spite of this, in order to tackle the vehicle occlusionproblem, we introduce a novel method based on vehicle’s convex shape analysis. Fig-ure 3.3 shows a detailed flowchart of vehicle segmentation process.

    3.2.1 Morphological Processing

    The basic morphological operations include erosion and dilation [BK08]. Erosioneliminates all the boundary points of an object, which reduces the region of objectby one pixel along its perimeter [BK08]; while dilation combines the object with itsconnected background points, which expands the region of the object [BK08]. Themore general morphology contains opening and closing, which are the different com-bination of erosion and dilation. In terms of opening, we deploy dilation after erosion.It is used to eliminate small objects, isolate objects in a fine connected region andsmooth the boundary of big objects while retain the dimension of them [BK08]. In thecase of closing, we apply erosion after dilation, which helps us to fill the small hollowwithin the object, connect adjacent objects and smooth the boundary while retain thedimension of them [BK08]. In vehicle segmentation module, we adopt consecutive

  • CHAPTER 3. VEHICLE DETECTION 40

    opening and closing operations to eliminate noise and fill the small hollow within thevehicle blob.

    In our experiment, a Gaussian model with 7× 5 filter window is first deployedto smooth the subtracted image. Then we apply opening operation once and closingoperations thrice on it. An example is presented in figure 3.7(c)

    3.2.2 Blob Searching

    After deploying a threshold for the grey image, we obtain a binary image with multipleconnected regions. The usual method for searching these regions is at pixel level,which can be affected by the noise greatly. To overcome this problem, we introduce anew searching method that is based on an N×N pixels region.

    Firstly, we split each frame into numerous N×N regions and scan them accord-ing to the order from left to right, top to bottom. Then we calculate the proportion offoreground pixels in one region. If the proportion is larger than a predefined threshold,then this region is considered as a foreground region. Finally, we combine all con-nected regions and identify valid blobs whose size is larger than a predefined thresholdT hmin−size. The optimal value of T hmin−size is different for each traffic video due tothe various shooting angles. Normally, 10% to 15% of video frame size may satisfymost cases. According to Figure 3.4, it shows that a 6×6 regional searching windowproduces a finer result on vehicle segmentation.

    Figure 3.4: Result of 6×6 regional searching algorithm

    The proposed method enhances the accuracy and robustness of vehicle segmenta-tion algorithm by diminishing the influence of noise, and helps us to merge severalblobs which belong to one vehicle and split one blob contains multiple vehicles into

  • CHAPTER 3. VEHICLE DETECTION 41

    different pieces by analysing the minimal rectangle of detected blobs, as shown infigure 3.7(d).

    3.3 Occlusion Removal

    Since the shape of partial occluded vehicle is explicitly different from the one of unoc-cluded vehicle (see figure 3.5(c) and figure 3.5(d)), we adopt and extend Zhang andWu’s method which detect the partial occlusion between two vehicles by analysingblob’s convex shape [WZ08]. The algorithm is shown as follow:

    1. Calculate convex hull of detected blob.

    2. Calculate the ratio of convex difference’s area to convex hull’s area, and take itas a basis for judging occlusion.

    3. As for occluded vehicles, analysis the convex hull and calculate the optimalcutting line which separate them into single vehicles.

    Some definitions are given as follows:

    1. Convex hull (CH) is a minimal convex polygon which encloses all points of theblob

    2. Convex difference (CD) is a difference area between vehicle blob and convexhull.

    An example of these two notions is shown in figure 3.5(f).

    3.3.1 Occlusion Detection

    According to figure 3.5, we can see that the unoccluded vehicle can nicely fit theconvex hull, whereas there are some gaps between occluded vehicles and convex hull.By analysing these gaps, we may easily detect the occurrence of vehicle occlusion.

    The ratio of convex difference’s area to convex hull’s area is presented as follow:

    RA =AreacdAreach

    =Areach−Areav

    Areach(3.5)

    where Areacd is the area of convex difference, Areav is the area of vehicle blob andAreach is the area of convex hull. As for the unoccluded vehicle, we may get a small

  • CHAPTER 3. VEHICLE DETECTION 42

    (a) original image of unoccluded ve-hicle

    (b) original image of occluded vehi-cles

    (c) blob of unoccluded vehicle (d) blob of occluded vehicles

    (e) convex hull (red) (f) convex hull (red) and convex dif-ference(green)

    Figure 3.5: Convex hulls of occluded and unoccluded vehicle

  • CHAPTER 3. VEHICLE DETECTION 43

    value of RA, whereas for occluded vehicles, the value of RA is much higher. Here weset a threshold T HRA to detect occlusion. If RA > T HRA , the blob is considered astwo occluded vehicles, otherwise, it represents a single vehicle. The value of T HRA isdetermined by the experiments illustrated in Chapter 6.

    3.3.2 Occlusion Handling

    To separate the occluded vehicles, we attempt to find an optimal cutting line accordingto the shape of occluded vehicles.

    Typically, convex difference is composed by several disjoint regions, within whichwe denote the region with the largest area as CDmax. Then we select the vertex whichis in the most interior of CDmax and denote it as the cutting point.

    Since the overlapping edges between CDmax and CH is a straight line, the vertexthat is in the most interior of CDmax can be obtained by calculating the distance be-tween each vertex and the edge of CH. The vertex with maximum distance is thecutting point which is denoted as point A. An example of cutting point is shown infigure 3.6(b).

    As for deciding the optimal cutting line, Zhang and Wu proposed an ergodic methodwhich scans all the possible cutting line with different angles. In order to satisfy thereal-time needs of VRTM system, we propose a new method with a low computationalcomplexity. The algorithm is shown as follow:

    1. Select the remaining convex difference region with maximum area.

    2. Calculate the most interior vertex point and denote it as point B.

    3. Repeat step 1 and 2, denote the new vertex point as C.

    The line segment AB and AC are two candidates of cutting line, as shown in figure3.6(c). In order to select the most optimal one, we define an area ratio TA that is similarto the RA:

    TA =AreavAreach

    (3.6)

    where Areav is the area of vehicle blob and Areach is the area of convex hull. It isobvious to see that the value of TA is always less than 1. As for the unoccluded vehicle,the value of TA approximates to 1, whereas for the occluded vehicles blob, its value ismuch less than 1. As a result of this, we select the optimal cutting line whose value ofTA is more approximate to 1, as shown in figure 3.6(a).

  • CHAPTER 3. VEHICLE DETECTION 44

    (a) an optimal ”cutting line” (b) ”cutting point” A (c) all possible ”cutting line”

    Figure 3.6: Possible cutting points and possible cutting lines

    3.4 Results

    Figure 3.7 shows the procedures for detecting vehicles from a video frame.After we obtain the original frame, shown in figure 3.7(a), and the background im-

    age, the foreground mask illustrated in figure 3.7(b) can be easily produced by usingbackground subtraction method. As shown in figure 3.7(c), morphological operations,including smoothing, opening and closing, are able to diminish the image noise andfill the image hollows. After deploying N N region-based blob searching algorithm,we may obtain a distinct binary image for all the foreground objects (shown in fig-ure 3.7(d)). According to figure 3.7(e), two occluded vehicles, circumscribed by redframes, are ideally separated by using shape-based occlusion removal method.

  • CHAPTER 3. VEHICLE DETECTION 45

    (a) original image (b) background subtraction

    (c) morphological processing (d) blob searching

    (e) occlusion removal

    Figure 3.7: Vehicle detection result

  • Chapter 4

    Vehicle Tracking

    Since the tracking module works in a real-time situation, it is impossible for the systemto scan each pixel in the frame and to find the position of matching object. In spiteof this, some model-matching algorithms and methods based on prior knowledge arealso not suitable for the system due to their intensive computational complexity. Inorder to simplify the matching process and enhance the computational efficiency, weestimate the vehicle’s current position by analysing its past trajectory, and search forthe matching target near the predicted point. The substantial reduction of searchingarea can not only decrease the computation complexity but also diminish the impactof other moving objects. The proposed method for vehicle tracking is reliable to mosttraffic scenarios and robust enough to overcome most significant interferences.

    Since vehicle blobs are collected from object detection module, the tracking methodwhich is based on image blob should be adopted. Consequently, a commonly usedtracking technique Kalman filter is applied in our system. The predicted value es-timated by Kalman filter can not only reduce the searching area but also overcomepartial occlusion when the VRTM system is trying to find the matching point of a cer-tain blob. In spite of this, we introduce a novel tracking algorithm based on regionalanalysis. The overall flowchart of tracking algorithm is shown in figure 4.1.

    4.1 Configuration of Kalman Filter

    Before applying Kalman Filter on the detected blobs, we need to gather informationfrom the blob, which contains the position of centroid, blob’s length and width. Theserecords will be taken as the definition of system state used in Kalman Filter. Thestate X(k) is a four dimensional vector (x(k),y(k),Vx(k),Vy(k)) where they refer to the

    46

  • CHAPTER 4. VEHICLE TRACKING 47

    Figure 4.1: Flowchart of region-based vehicle tracking algorithm

  • CHAPTER 4. VEHICLE TRACKING 48

    position of target’s centroid and the velocity along x and y axis . According to equation2.13, we define transition model A and control-input model B as follow:

    A =

    1 0 ∆t 00 1 0 ∆t0 0 1 00 0 0 1

    B =

    000∆t

    (4.1)

    where ∆t is the time interval between two adjacent frames. In addition, we define theobservation matrix H as equation 4.2.

    H =

    [1 0 0 00 1 0 0

    ](4.2)

    According to the work of Zhang et al. [ZYh05], the noise covariances Q and R can beset as follow:

    R =

    [0.03 0.005

    0.005 0.3

    ]Q = 0.01×E (4.3)

    The detailed equations deployed in Kalman Filter algorithm are illustrated in section2.4.2.

    4.2 Region-based vehicle tracking algorithm

    The process of object tracking is to build a relationship between the targets in two ad-jacent frames. Typically, the whole tracking process can be divided into three partswhich are extraction of moving regions, region prediction and region matching & up-dating. The former two have been solved by vehicle segmentation module and KalmanFilter. Here we discuss the third part in detail.

    In order to store status parameters of target blobs, we set up a chain list denotedas tracking control list. Specifically, we assume a set of image sequence as F ={ f1, f2, · · · , fn} where fn contains m motion targets denoted as S1n,S2n, · · · ,Smn . S

    jn is

    defined as follow:S jn = {X jn ,Y jn ,L jn,W jn } (4.4)

    where X jn , Yj

    n are the horizontal and longitudinal coordinates of target j in frame n,L jn and W

    jn are the length and width of minimal rectangle of target region. The motion

    statuses of all target regions in each frame are stored in a tracking list where each node

  • CHAPTER 4. VEHICLE TRACKING 49

    Figure 4.2: Overall design of tracking control list [Qia09]

    represents a target region.Figure 4.2 shows an overall design of tracking list where the nodes listed along

    longitudinal direction store vehicle’s status parameter, like the size of minimal frame,coordinate of centroid and velocity, etc., and the ones listed along horizontal directionindicate vehicle’s trajectory.

    After obtaining the status parameter of one motion target, we match it with thedetected blobs within an estimated region produced by Kalman filter. The matchingcriterion is decided by equation 4.5 [Qia09].

    R(i, j) = α×D(i, j)+β×S(i, j)

    D(i, j) =√(xik− x

    jk+1)

    2+(yik− y

    jk+1)

    2

    S(i, j) = |LikWik−L

    jk+1W

    jk+1|

    (4.5)

    where xik,yik,L

    ik,W

    ik are the position, length and width of ith moving object in the frame

    k. D(i, j) is the Euclidean distance of two centroids of object i in frame k and k+ 1.S(i, j) is the differential of dimensions of same object in frame k and k+1. α and β aretwo coefficients which decide the proportion of distance and the degree of deformationin matching equation R(i, j). The selections of these two coefficients are illustrated inChapter 6.

    To filter out the invalid matching targets of moving blob i, we set a threshold T hmaxto reject the objects whose value of R(i, j) is higher than it. The T hmax is defined asfollows:

    T hmax = α×Dmax(i, j)+β×Amax(i, j) (4.6)

  • CHAPTER 4. VEHICLE TRACKING 50

    Where Dmax(i, j) is the height of blob i, and Amax(i, j) is the area of blob i. As for theremaining candidates, the moving blob that has the minimal value of M(i, j) is selectedas the matched target for the object i. If there are no candidates left, the moving blob iis denoted as a new vehicle.

    4.3 Results

    Figure 4.3 illustrates some tracking results using region based method:

    (a) frame 285 (b) frame 301

    (c) frame 320 (d) frame 342

    Figure 4.3: Vehicle Tracking Results

    As for the output of vehicle tracking, we circumscribe each vehicle blob and assigna unique number for it, thus, it is much easy for us to observe the tracking process ofeach moving vehicle. According to Figure 4.3, in each video frame, only some ofthe vehicles are detected because others are either leaving the picture or too small todetect. The minimal size of a valid vehicle blob is illustrated in Section 3.2.2.

  • Chapter 5

    Vehicle Classification

    There are two main steps in vehicle classification, which are feature extraction andfeature classification. During the first step, the extracted features should be typical todifferent types of vehicles. As for some ideal road conditions, like highway and one-lane route, it was sufficient to use the length or dimension of blob and average velocityto classify the vehicle types if the system only works on one stationary camera. How-ever, in real situations, the VRTM system should be capable enough to handle varioussituations. Therefore, we need to find a rich representation that produces repeatableand discriminative features from a limited number of vehicles. According to the richrepresentation proposed by Ma and Grimson [XM], edge points are more repeatablethan other feature points, and a sufficient discriminability can be achieved by associat-ing edge points with similar neighbouring region.

    Regarding the second step, Ma and Grimson proposed a constellation model thatis used for feature classification [XM]. By modelling feature descriptors as a singleGaussian, we may obtain a reliable probability for each class.

    Figure 5.1 shows an overall flowchart of vehicle classification module.

    5.1 Edge-based Feature Extraction

    Repeatability and discriminability are two significant factors of object recognition[XM]. As for traffic video, vehicle’s contour is an explicit and unique feature foridentifying vehicle types. Other features, like colour, size and velocity, are highly vari-able and hardly measurable. Therefore, we adopt edge points as a basis for extractingfeatures due to its comparatively high repeatability and sufficient discriminability. Theoverall algorithm of edge-based feature extraction is shown as follows:

    51

  • CHAPTER 5. VEHICLE CLASSIFICATION 52

    Figure 5.1: Flowchart of vehicle classification algorithm

    1. Transform original image into binary edge image.

    2. Calculate a descriptor for each edge point.

    3. Segment edge points into several point groups.

    4. Generate features from point groups.

    5.1.1 Edge Detection

    Edge is defined as a group of points where the grey scales of neighbouring pixels havea great change. Edge exists among different targets, which denotes the end of oneregion and the beginning of the other region. Edge detection plays an important role inimage segmentation, texture analysis and image recognition.

    Currently, some classical differential operators that are sensitive to the step changeof grey scale are employed to detect image edges. Some edge operators based onfirst-order derivative, like Sobel [SF] and Prewitt operator [Pre70], use either 2×2 or3×3 model which is convolved with each pixel to extract edges. Canny edge detector[Can] is an optimal edge operator deduced by certain constraint conditions. Here weintroduce Sobel operator and Canny edge detector in detail and give their detectionresults.

  • CHAPTER 5. VEHICLE CLASSIFICATION 53

    Figure 5.2: Two kernels of Sobel operator

    Sobel Operator

    Sobel operator calculates graphic gradient of one pixel by analysing its neighbouringregion. Then some invalid edges are rejected by setting a lower threshold. Sobeloperator is defined as follow:

    S(i, j) = |∆x f |+ |∆y f |

    = |( f (i−1, j−1)+2 f (i−1, j)+ f (i−1, j+1))

    − ( f (i+1, j−1)+2 f (i+1, j)+ f (i+1, j+1))|

    + |( f (i−1, j−1)+2 f (i, j−1)+ f (i+1, j−1))

    − ( f (i−1, j+1)+2 f (i, j+1)+ f (i+1, j+1))|

    (5.1)

    Figure 5.2 illustrates two 3×3 convolution kernels which correspond to vertical edgeand horizontal edge respectively. The output of a pixel is the maximum value of thesetwo convolutions. And the final result is an edge magnitude image. Sobel edge detectorcan not only detect a finer edge result, but also be easily implemented in geographicspace. And it is insensitive to the noise.

    Sobel operator is able to smooth noise, which produces accurate information aboutedge’s orientation. However, it will detect many false edges, which reduces the posi-tioning accuracy. Figure 5.3 shows some edge images processed by Sobel operator.

    Canny Operator

    Canny operator’s aim is to discover an optimal edge detection algorithm [Can]. Itadopts calculus of variation in order to find optimal edges. Firstly, a Gaussian function

  • CHAPTER 5. VEHICLE CLASSIFICATION 54

    is utilised to smooth the image, which is shown as follow:

    fs(x,y) = f (x,y)×G(x,y)

    G(x,y) =∂2G∂x2

    +∂2G∂y2

    =1

    πσ4(x2 + y2

    σ2−1)exp(−x

    2 + y2

    2σ2)

    (5.2)

    where x and y are the coordinate of each pixel; σ is the mean square deviation ofGaussian distribution. The Gaussian filter with small kernel size generates a little blureffect, which can be used to detect explicit and small edges, while the one with largekernel size can be used to discover smoothed and thick edges.

    Secondly, we use first differential operator to calculate the maximum value ofderivatives. The gradient of fs(x,y) is calculated as follow:

    P(i, j)≈ ( fs(i, j+1)− fs(i, j)+ fs(i+1, j+1)− fs(i+1, j)2

    Q(i, j)≈ ( fs(i, j)− fs(i+1, j)+ fs(i, j+1)− fs(i+1, j+1)2

    (5.3)

    Finally, the magnitude and orientation of each pixel is calculated by transforming or-thogonal coordinates into polar coordinates, which is shown as follow:

    M(i, j) =√

    P(i, j)2 +Q(i, j)2

    θ(i, j) = arctan(Q(i, j)P(i, j)

    )(5.4)

    where M(i, j) represents the edge magnitude, θ(i, j) indicates edge’s orientation. Theoverall orientation of the edge is expressed by θ(i, j) that makes M(i, j) reach the localmaximum.

    Canny operator’s directivity makes it more accurate than other operators in termsof edge detection and edge location. Since Canny utilise upper and lower thresholds todetect explicit edge and obscure edge, it is more insensitive to graphic noise and morelikely to discover actual obscure edges. As for traffic conditions, Canny operator ismuch more effective than other methods due to its high accuracy for detecting obscureedge. According to the experimental result, we assign upper and lower thresholds asτ1= 100, τ2= 150. Figure 5.3 shows some edge images processed by Canny operator.

  • CHAPTER 5. VEHICLE CLASSIFICATION 55

    (a) Original image (b) Canny operator (c) Sobel operator

    (d) Original image (e) Canny operator (f) Sobel operator

    Figure 5.3: Edge images generated by Sobel and Canny operator

    Short Edge Removal

    It is obvious to see that Canny edge detector produces a more comprehensive vehiclecontour than its counterpart. However, numerous textures in vehicles’ bodywork arealso detected, which may decrease the repeatability of vehicle feature to a great extent.To solve this problem, we introduce a variance-based method for diminishing shortedges.

    Since most irrelevant edges have a concentrated distribution, we may use coordi-nates’ variance to describe the concentration of edge points, which is shown in equation5.5 [Zhu]:

    C =N

    ∑i=1

    (Xi− X̄)2 +N

    ∑i=1

    (Yi− Ȳ )2 (5.5)

    where N is the number of edge points, C is the sum of vertical and horizontal coor-dinates’ variances. A smaller value of C represents a more concentrated distributionof edge points. A threshold τ is set to reject the edges with concentrated distribution.Similar to [Zhu], we found that a value of τ = 30 performs well in most cases. Figure5.4 gives some final results of edge detection in which vehicles’ overall contours are

  • CHAPTER 5. VEHICLE CLASSIFICATION 56

    prominent.

    5.1.2 Descriptor and Feature Extraction

    In order to achieve sufficient discriminability for extracting features, we adopt SIFTdescriptor to indicate each edge point. According to section 2.5.2, an orientationhistogram is formed by calculating the gradient magnitude and orientation for eachpoint in a subregion. To adapt the vehicle classification, Ma and Grimson make severalkey modifications to the SIFT algorithm:

    1. Different from SIFT that considers orientation ranges between 0◦ and 360◦, ourmethod regards the orientations with 180◦ differences as the same. Therefore,the orientation histogram contains 18 bins (10 degrees for one bin) instead of36 bins. Consequently, the vectors are robust against contrast differences andillumination changes [XM].

    2. In order to further suppress the influences of specular reflections and illuminationchange, shown in figure 2.1(a) and 2.1(c), we set a threshold for the gradientmagnitude instead of SIFT vector [XM].

    3. Lowe used Euclidean distance to measure the similarity between two SIFT vec-tors. However, Euclidean distance overlooks the differences between small binsbut only cares about the difference between large bins. To solve this problem, weadopt χ2-distance as the distance between two descriptors since it gives a morecomprehensive comparison between two histogram distributions [XM].

    Figure 5.5 illustrates some descriptors for edge points near vehicle’s trunk. It canbe seen that, as for the same vehicle type, the descriptors for edge points near vehicle’strunk are quite similar, which achieves a finer repeatability. And the histograms foredge points belong to different vehicle types present a high discriminability. In thisexample, system can easily recognise the differences between the 9th and the 10th

    elements of each descriptor.

    5.1.3 Feature Generation

    Due to image noise and texture variances of similar vehicle, individual edge points ofsimilar vehicle still have obvious differences as shown in figure 5.4. Therefore, theyare not comprehensive and repeatable enough for building vehicle features.

  • CHAPTER 5. VEHICLE CLASSIFICATION 57

    (a) Original image (b) Canny operator (c) Short edges removal

    (d) Original image (e) Canny operator (f) Short edges removal

    (g) Original image (h) Canny operator (i) Short edges removal

    (j) Original image (k) Canny operator (l) Short edges removal

    Figure 5.4: Results of edge detection

  • CHAPTER 5. VEHICLE CLASSIFICATION 58

    (a) car1 (b) car2

    (c) van1 (d) van2

    Figure 5.5: Descriptor examples. The horizontal axis shows the numbers of each bin,and vertical axis presents the proportions of each bin’s gradient magnitude.

  • CHAPTER 5. VEHICLE CLASSIFICATION 59

    (a) Edge-based(42 features) (b) Edge-based(44 features) (c) Edge-based(31 features)

    (d) DoG(14 features) (e) DoG(33 features) (f) DoG(21 features)

    Figure 5.6: Some examples of edge-based features and DoG features

    In terms of spatial location, edge points that are close to each other usually havesimilar SIFT vectors, thus we may cluster these points into groups so as to enhancethe repeatability of vehicle features and obtain a more accurate model [XM]. Herewe adopt mean shift algorithm [CM02] to segment descriptors and spatial positions ofedge points. And the vehicle feature is formed based on these segmented results.

    Here we denote feature F of an vehicle image as follow:

    F = { fi} i = 1, · · · ,N

    fi = {{ ~pi j},{~si j}, ~ci j}(5.6)

    where ~pi j is the coordinate of the jth point in ith segment, ~si j is the SIFT descriptorof the jth edge point in segment i, ~ci j is the mean SIFT descriptor of segment i andN is the number of features of the image sample. Figure 5.6 shows some feature ex-amples extracted by edge-based method and Difference of Gaussian (DoG) [Low99a]respectively. It is obvious to see that edge-based features, shown as lines with differentcolours, have much higher repeatability and more quantity than DoG.

  • CHAPTER 5. VEHICLE CLASSIFICATION 60

    5.2 Vehicle Type Recognition

    As for vehicle type recognition, our main purpose is to construct a reliable and concisemodel for classifying vehicle types. In order to achieve this, we accumulate featuresextracted from all the training samples and further cluster them to obtain a feature pool,which highlights some frequent features and suppresses rare ones. This subsectionintroduces some promising approaches for building feature pool and learning optimalmodel parameters.

    5.2.1 Feature Pool Ge