OTWC: an efficient object-tracking method

SIViPDOI 10.1007/s11760-013-0557-8

ORIGINAL PAPER

OTWC: an efficient object-tracking method

Maryam Koohzadi · Mohammadreza Keyvanpour

Received: 2 August 2012 / Revised: 13 September 2013 / Accepted: 13 September 2013© Springer-Verlag London 2013

Abstract Detection and tracking of moving objects is animportant topic in computer vision and has turned into anactive field of research with remarkable recent progress. Thispaper proposes OTWC; an efficient object-tracking methodwith Code Book (CB) model; a high-speed method withproper accuracy for detection and tracking moving objects.Our proposed method combines CB (Sigari and Fathy inProceedings of the international multiconference of engi-neers and computer scientists, pp 19–21, 2008) algorithm,density-based spatial clustering of applications with noise(DBSCAN) (Ester et al. in Proceedings of 24th VLDB con-ference, 1998) clustering algorithm, and incremental bulkDBSCAN algorithm in order to effectively track movingobjects. We have claimed that CB algorithm models the back-ground and foreground simultaneously so that it is possible touse the wasted information in the background estimation stepin order to incrementally detect and track the foreground. Wehave also compared our method with other available alterna-tives including a general algorithm for tracking two datasets.The results demonstrate that our proposed algorithm has anacceptable performance in terms of both accuracy and speed.

Keywords Object detection and tracking · Incrementalbulk DBSCAN algorithm · Code book model · Videoanalysis

M. Koohzadi · M. Keyvanpour (B)Department of Computer Engineering, Alzahra University, Tehran, Irane-mail: [email protected]

M. Koohzadie-mail: [email protected]

1 Introduction

With massive amounts of digital video content on the inter-net, users increasingly need more assistance to access digitalvideos; consequently, a lot of research has been conductedabout ’video summarization’ and ‘semantic video analysis’ toserve such needs. We too introduced a comprehensive ana-lytical framework in our previous work on video analysismethods [3].

The concept of tracking moving objects is part of a widerfield of computer vision; thus, video surveillance data canbe regarded as an important class in video data, and indeedvideo surveillance analysis has received much attention overthe last 20 years. There is a large body of literature in thisfield, for instance Hu et al. [4], Moeslund et al. [5,6], Valeraand Velastin [7], and Alexandre Alahi et al. [8]. Computer-vision-based tracking has established its place in several real-world applications, among which one can mention visual sur-veillance, sports analysis, video editing, tracking laboratoryanimals, human–computer interfaces, and cognitive systems.

Object detection, discriminative appearance modeling,tracking different objects, and handling their occlusions area number of tasks that any object-tracking method must per-form, but monocular view and real-time constraints make thethings even more challenging.

This article presents an efficient method for detection andtracking of objects in video surveillances. We claim that ourproposed method has the following novelties:

• A new object-tracking method, using a combination ofcommonly popular techniques namely: the Code Book(CB)-based background estimation, DBSCAN, and incre-mental bulk DBSCAN algorithms.

• CB model words are based on deleted codes; the two-layered CB model is used for modeling the background:

123

SIViP

A simple data structure containing two CBs which aredefined per pixel. The CBs contain some code words(CW); the first layer is the CB and the other one is thecache CB, both contain some CWs related to a pixel. Dur-ing the input sequence, the main CB models the currentbackground, while the cache CB models the new back-ground. So far, the CB model has been used for backgroundmodeling, that is, trying to estimate the background andconstantly refining data structures. We, however, claim thatthe information deleted in the process will correspond tothe foreground; thus, by using the same information, fore-ground and background can be estimated with the samespeed in the same process.

• In the present research, a method is proposed for trackingobjects using the data wasted in the CB model. It is shownthat it is possible to obtain the trajectory of objects byapplying the appropriate incremental clustering methodsto the deleted code words. Since in our method, the numberof the steps which are followed in common methods fortracking objects decreases and the same data structure usedin CB model will also be used for tracking, there is no needto extract different features from the detected objects usedin the tracking process; objects are detected and trackedat once (and not in two distinct stages); consequently,the speed of extracting the trajectory of objects willincrease. The type of the clustering algorithm also playsan important role in increasing the speed of our proposedmethod.

• To cluster the data, it is required that we have a crite-rion for measuring the similarities. The data used in thisclustering are CWs. Hence, we need a similarity measurethat, when provided with the structure of the CWs, detectsthe CWs belonging to an object as a cluster member. Formeasuring the distance of the code words in the clustering,a multi-level criterion corresponds to the data of the CBmodel.

• The two-layered CB performance is dependent on threethresholds. Threshold values for different videos are dif-ferent, and determining an appropriate value for a thresh-old is a time-consuming task. Thus, new relations arerequired for calculating the parameters of the two-layeredCB model. The conducted tests confirm the correctness ofthese relations. Moreover, determining a suitable value forthese parameters is also a time-consuming task.

In the rest of the article, we compare and evaluate ourobject detector and tracker with other currently populartechniques. The article is organized as follows: First, wereview the previous works in this field. Then, in Sect. 3,we introduce our detection and tracking method, as wellas its important modules. Section 4 shows the experimentalresults. Finally, the discussion and conclusions are presentedin Sect. 5.

2 Related work

More closely related to our work are the following twopublications: Kim et al. [9] introduced the layered CB tomodel the multi background, moving background, and scene-illumination variations in a compact data structure. Their pro-posed algorithm is a color model and one of the best meth-ods in this filed. Compared with the ‘mixture of Gaussians’(MOG) and Kernel [9], basic CB model is very fast and usu-ally more efficient in memory. They have shown that basic CBmodel, which uses less computational complexity and mem-ory capacity, has better results in foreground–backgroundsegmentation with respect to MOG and Kernel. To improvethe basic CB model, Kim, et al. proposed the layered model-ing and adaptive CB, called the adaptive layered CB. Adap-tive layered CB can model a new background during the inputsequence and is relatively robust against illumination varia-tions. Moreover, they claimed that although the adaptive lay-ered CB is robust against illumination variations, it cannotmodel very slow scene-illumination variations in extendedtime frames.

We used an advanced version of this algorithm, Moham-mad Hossein Sigari et al. [1] introduced a two-layered CBmodel: a two-layered CB constructed by sampling the scenein extended time frames. For each pixel, there exists a CBcontaining some CWs which are a compact representation ofthe background.

Recent literature shows growing attention toward morecomplex detection-based tracking approaches [15,16], wheretrained object detectors identify target objects in every framefor further temporal analysis and tracking.

In this paper, we propose a tracking algorithm throughcombining the currently applied techniques such as thetwo-layered CB model, DBSCAN [2] and incremental bulkDBSCAN [2].

We also compared our method against two trackers fromDaniel et al. [14] and Yigithan and Dedeoglu [1]. Danielet al. [14] introduced a novel tracker that presents a visualobject-tracking method, applying an event-based perfor-mance evaluation metric for assessment. The tracking frame-work uses Bayesian per-pixel classification to segment animage into foreground and background objects based on theobservations of object appearances and motions in real-time.Yigithan and Dedeoglu [15] introduced a base and sim-ple tracker that uses foreground and background detectionmethod for determining the foreground region in each framebefore using a matching method to track the objects.

3 Methodology

A frequently used method for detecting moving objects is‘background subtraction.’ The basic idea behind this method

123

SIViP

Frames

Background model construction with codebook model

Background subtraction

Select CWs with frequency>1

Incremental bulk DBSCAN

Main, Th, Tdelete, Tadd

DelCWs

ObjCWs

ST_Information extraction

DelCWs

DBSCAN Clustering

InitCluster

ST-Info

FramesOne by one frames

The first n framesThe rest frames

Num < kYes

No

Fig. 1 An overview of the proposed object detection and trackingmethod; ‘Frames’ indicates the first ‘n’ frames, ‘background modelconstruction,’ and CB model step is applied to the frames. Using theresults obtained from this step, the threshold values and the main back-ground model will be provided for the ‘background subtraction’ step,and DelCWs that have been removed during the refinement of the back-ground model will be sent to ‘DBSCAN Clustering’ step; the initial clus-

ters (InitClusters) are created and sent to ‘Incremental bulk DBSCAN’step. In the Background subtraction step, if the specified number offrames in the sequence is received, then the background model will beupdated by seeing new frames. The DelCWs are then sent to ‘Incremen-tal bulk DBSCAN.’ Clusters that have no growth (ObjCWs) are sent to‘ST_Information extraction’ step to extract the trajectory (ST-Info)

is to obtain the difference between the image and the refer-ence background model. This difference shows the movingparts and objects. The reference background model is createdand updated by observing the sequence of the input frames.The main challenge in object detection is to find a precisebackground model for accurate detection and tracking of theobjects in the scene. Our proposed method belongs to thebackground subtraction group. The architecture of the pro-posed system for detection and tracking of moving objectsis illustrated in Fig.1. In following figure, DelCWs meansdeleted CWs and ObjCWs means CWs for an object; Mainmeans main CB, InitCluster means initial clustering, ST-Infomeans spatial temporal information.

This proposed method is neither dependent on the speedof the objects nor their distance, plus it is assumed that duringthe input, the video context frames do not change.

CB Algorithm is a real-time algorithm for segmentingthe foreground and background in which the sample back-ground values at each pixel are quantized into CBs which

represent a compressed form of the background model for aframe sequence. In the two-layered CB model, some CWsare deleted while entering the new frames and this causesthe constructed background to update. As a result of eachframe, the deleted CWs are categorized into three groups:(1) CWs dependent on the background, (2) CWs represent-ing the existing noise in the frames and (3) CWs which havealready been dependent on the background model deletedduring the background alteration. Since the final objectiveof background modeling is to find the foreground and theparts dependent on the moving objects, this research con-siders those deleted CWs dependent on the foreground anduses this information and appropriate clustering algorithmsto track the moving objects.

The present research considers the supervision field andassumes that the background does not move, and the camerais stationary. According to the experiments, the CWs whichbelong to the foreground can generate spatially concrete andhigh-density groups, while the background information is

123

SIViP

Fig. 2 The spatial position ofthe clustered CWs related to themoving object. The objects aremarked out by a yellow circleand the pixels of the clusters bya red circle (color figure online)

sparse and little. This means that deleted CWs which belongto the foreground have generally been removed from the CBswhich belong to adjacent and neighboring pixels. Hence, todetect the CWs which belong to the background and havea little noise, we applied the DBSCAN clustering methodand the proposed distance-relation, which shows the tempo-ral, spatial, and distance between frequencies, and is oftenused for identifying the sparse noises and detecting concreteclusters which belong to the foreground.

It is also assumed that the dimensions of the objects whichbelong to the background are more than 1 × 1 pixel. If CWshave the frequency, it means that they are observed only onceand thus will be deleted. According to the definition, thenoisy pixels in the image which are caused by imaging dis-turbance are certainly the noisy CWs; such CWs are deletedbefore entering the clustering step. Thus, we suggest the sys-tem shown in Fig. 1. First, using the first few frames of thevideo, an elementary background model is created. Next,using the output of this model (i.e., deleted CWs), the ini-tial clusters are built. Then, the subsequent frames enter thesystem one by one and after detecting the background fromthe foreground. Deleted CWs, which are repeated more thanonce, are retained and considered to belong to the foregroundpixels and are then participated in the clustering process.When the number of the ‘read frames’ reaches a certainfixed value, deleted CWs are clustered with incremental bulktechnique. The clusters without any member increase areremoved from the process and subsequently used for extract-ing the trajectory of the objects in the foreground. Fig. 2shows the spatial position of the CWs which belong to acluster.

3.1 Background modeling

The CB algorithm adopts a quantization/clustering tech-nique to construct a background model from long observationsequences. For each pixel, it builds a CB which consists ofone or more code words. Samples at each pixel are clusteredinto a set of code words based on color distortion metric andbrightness bounds. Not all pixels have the same number ofcode words. The background is encoded in a pixel-by-pixelbasis [13].

In this paper, as mentioned above, a two-layered codebook was utilized for modeling the background. This methodmodels the observed colors for each pixel in a data structurecalled CB which consists of CWs. Each CW has the ability tomodel a cluster of pixel samples and create a part of the back-ground. When detecting the background, if the gray level ofthe input pixel is within the domain of any CW, the pixel willbe considered a background pixel; otherwise, a foregroundpixel. The method, then, is composed of two different stages:background modeling and foreground detection. As shown inFig. 3, the background modeling stage consists of two steps:initial CB modeling and CB refinement [1].

In the initial CB model construction for each pixel, sup-posing that: X = {I1, I2, . . . , IN } is the training sequence forthe gray level of the pixel; it is the gray level of the input pixelin frame t in the training sequence; C = {c1, c2, . . . , cL} isthe CB containing L CWs; Data structure of ith CW con-tains ci = (Ki , Ji , fi , λi , pi , qi , x, y) as a 8-tuple vector; ci

denotes the gray level bounds and spatio-temporal variablesabout CW; Ki and Ji are, respectively, the minimum andmaximum gray levels of the pixel;

123

SIViP

Frames

Initial codebook i

Codebook refinement

Fig. 3 An overview of the background modeling with code book model

And fi denotes the frequency of this CW in the trainingsequence.

The longest interval in which this CW does not occur inthe sequence is defined as the maximum negative run length(MNRL) and is denoted by λi ;

pi and qi are, respectively, the first and the last times ofoccurrence in the sequence;

x, y is the position of CW. Alpha is a real positive parameterthe best value of which is approximately 10. From theoreticalperspective, alpha designates a gray level interval that a codeword can contain. When alpha is equal to 10, it means thateach code word covers an interval of gray level the length ofwhich is equal to 10.

The algorithm of the initial CB construction is depicted inFig. 4.

After constructing the initial CB, moving objects and noiseare coded in some CWs of the initial CB. Therefore, the initialCB is fat and needs to be refined through temporal filtering.Fixed and moving backgrounds usually occur in a sequencewith a short period, while foregrounds have a long period.This distinction between foreground and background objectswas used for refining the CB; the λ for each CW shows thelong interval which does not occur in training sequence. CWswith large λ are foreground while the others are background.Therefore, CB can be refined as follows [1].

M = {cm | cm ∈ C, λm < T M} (1)

M is the refined CB that models the background; Tm is cal-culated using the equation below:

Tm = round

(numTrainImg

2

)+ 1 (2)

3.2 Detection of the foreground

In order to model new backgrounds in the new input sequencewith higher speed, two code books are used in the two-layeredmodel [1]. Next, the two CBs (i.e., main and hidden) are used.While both have the same structure, The Main CB is usedfor modeling a stable background, whereas the Hidden CBis used for modeling an unstable background. The Main CBis created in the training phase, but the Hidden CB remainsempty. In the input sequence, the foreground is detected fromthe background and the CBs are updated. Figure 5 shows theforeground and background detection algorithm used in the

Fig. 4 The initial code bookconstruction algorithm [1]

123

SIViP

Fig. 5 The two-layeredbackground modeling algorithm[1]

In this research, the following relations are suggested to determine the suitable

value for the threshold limits.

Algorithm: two-layered background modeling algorithm

Input: numberOfframe, MainCodeBook, HiddenCodeBook, Th, Tadd, Tdelete

Output: deletedCodeWords, MainCodeBook, HiddenCodeBook

1: Read the numberOfframe frame

2: p = 1 (p is index of pixels in frame f)

3: For an input pixel p, find a matching CW in MainCodeBook.

4: If a CW was found in MainCodeBook

5: Update the CW

6: else

7: Find a matching CW in HiddenCodeBook.

8: If a CW was found in HiddenCodeBook,

9: Update the CW

10: else

11: Create a new CW in HiddenCodeBook and put this pixel in it.

12: Move CWs from HiddenCodeBook that to deletedCWs.

13: Move all CWs from HiddenCodeBook to MainCodeBook staying longer than Tadd in HiddenCodeBook.

14: Move all CWs from MainCodeBook to deletedCWs not appearing longer than Tdelete.

15: end of read pixels of the frame

16: End

γ > Th

two-layered CB model; the deleted CWs (DelCW) on eachstep of this background estimation procedure are regarded asthe output for our proposed algorithm.

To refine the code books, this model uses three importantthresholds: Tadd, Tdelete and Th . If the λ of a CW in the HiddenCB is longer than Th , it will be deleted from the HiddenCB; if a CW in the Hidden CB stays longer than Tadd, itwill be moved to the Main CB; if the CW of the Main CBis not longer than Tdelete, it will be deleted from the MainCB [1].

In this research, the following relations are suggested todetermine the suitable value for the threshold limits.

1. The threshold of the presence time (Tadd): The CWs wereheld in the Hidden CB in order to detect the CWs whichbelong to the background and add them to the Main CB.The CWs existing in the Main CB will remain there onlyif they constantly update the value of the q, to the extentthat the distance between q and the current time remainsconstantly less than (t − q < Tdelete). All of the CWshave an emergence time p, it could be said that the life-time of all of the CWs in the Main CB needs to be morethan the minimum amount. On the other hand, accordingto the condition λ < Th , the qs of the CWs in the Hid-den CB should also constantly update themselves in orderto remain in the Hidden CB. Therefore, the CWs in theHidden CB are stable only if their qs are updated. If thelifetime of CWs in the Hidden CB increases sufficiently,they will be transferred to the Main CB. Hence, we suggest

the most frequented lifetime of the CWs which belong tothe Main CB for Tadd:

Tadd = {Mode (q − p) |q, p ∈ Main} (3)

2. The threshold of the non-presence (Tdelete): CWs whichwere kept as the background model in the Main CB oughtto have the feature of repeating the background. Hence,after selecting the CWs which belong to the background,these features are constantly checked in order to deletethose CWs which are not repeating the background. Whenproducing the initial background model, this feature isconsidered by updating (λ) and checking its value; λ

denotes the longest time of not observing a CW in thebackground model. Thus, for the threshold Tdelete, themaximum λ can be suggested, as in the following rela-tion:

Tdelete = {Max (λ)|λ ∈ Main} (4)

3. The threshold of the longest non-presence time (Th): whenconstructing the initial background model, a parameter(Tm) with the same application was needed; this parameterwas given a value based on the existing resources andusing the suggested relation below:

Th = Tm = round (num Training/2) + 1 (5)

123

SIViP

An example of two layer CB algorithm is depicted inFigs. 6 and 7.

In the Fig. 6, after noticing the color of the first pixel,Hidden CB is empty and the amount of the input color isexamined by the Main CB, in which there is a correspond-ing CW. Thus, the amount of input is selected as back-ground. This sequence is repeated for the amount of thecolor of the next pixel. For the next pixel, as there willbe no correspondence in the Main CB, a new CW will beadded to the Hidden CB. For the next input, the amountof the input color is first examined by the Hidden CB,and since there is a corresponding CW, Hidden CB willremain unchanged. For the next two inputs, the amount ofthe noticed color will correspond to the Main CB, and Hid-den CB will be cleared after noticing the next input; thus,the amount of the input color will correspond with MainCB.

In the above figure, after noticing the color of the firstpixel, Hidden CB is empty and the amount of the input coloris examined by the Main CB, in which there is a correspond-ing CW. Thus, the amount of input is selected as background.This sequence is repeated for the amount of the color of thenext pixel. For the third pixel, as there will be no correspon-dence in the Main CB, a new CW will be added to the Hid-den CB. For the fourth input, the amount of the input color

is first examined by the Hidden CB, and since there is a cor-responding CW, Hidden CB will remain unchanged. For thenext input, the amount of the noticed color will correspond tothe Hidden CB and as one color has been repeated for threesequential frames a CW will be added to the Main CB. Afternoticing the next inputs, the amount of the input color willcorrespond to Main CB and thus will be selected as back-ground.

3.3 Clustering by the proposed distance criterion

The DBSCAN is one of the first algorithms introduced fordensity-based clustering. This algorithm is based on thefact that clusters are the regions of the data space with alarge density separated by the regions with lower densi-ties. It is, moreover, able to detect the clusters with differentshapes and sizes from the bulky and noisy data and is alsoone of the quickest algorithms in clustering large data-sets[2].

As mentioned earlier, according to the experiments con-ducted in this field, the CWs which belong to the foregroundcan generate concrete and high-density groups, while thebackground information is sparse and little. Hence, to detectthe CWs which belong to the background and have a littlenoise, we applied the DBSCAN clustering method and the

Fig. 6 An example oftwo-layered backgroundmodeling algorithm,background is constant

Fig. 7 The two-layeredbackground modelingalgorithm, background ischanging

123

SIViP

proposed distance-relation which shows the temporal, spa-tial, and distance between frequencies and is often used foridentifying the sparse noises an detecting the concrete clus-ters which belong to the foreground.

It is expected from the obtained clusters to show the mov-ing objects and their trajectories. Therefore, the selected dis-tance criterion should be able to place the CWs which belongto a moving object in a foreground cluster while identifyingsparse noises. Hence, the distance criterion ought to have thefollowing characteristics:

1. Spatial similarity: CWs which belong to an object areresultant of the neighboring pixels.

2. Temporal similarity: CWs which belong to an object havethe same creation and last presence times.

3. Frequency similarity: CWs which belong to an object areclose to each other according to the number of times theyhave been observed.

Hence, the following multi-level relation is proposed formeasuring the closeness of two CWs.

if 2√

(x2 − x1)2 + (y2 − y1)2 < SEps Then

if |p2 − p1| < PEps Then

if |q2 − q1| < QEps Then

if | f2 − f1| < FEps Then

These points are similar. (6)

In Eq. 6; SEps, PEps, QEps and FEps are the thresh-olds which indicate that two CWs are similar. So we mea-sured the spatial, temporal and frequency similarity of CWsin each step and every one indicates tentatively. At firstin the distance measuring of two CWs, we compare theEuclidean distance with X and Y parameters, if these CWsare similar then we compare the distance in P parame-ter, after this if these two CWs are similar we then com-pare these CWs in q and f parameters. Finally if these twoCWs be similar in whole steps, they assign to the samecluster.

3.4 Incremental bulk DBSCAN

Incremental DBSCAN is a popular incremental algorithmin which data can be added/deleted to/from the existingclusters, one point at a time. Incremental Bulk DBSCANalgorithm is capable of adding points in bulk to the exist-ing set of clusters. In this new algorithm, the data pointswhich are going to be added are first clustered using theDBSCAN algorithm and then the newly formed clustersare merged with the existing clusters. That is, rather thanpoints, we add the clusters incrementally; the Incremen-tal Bulk DBSCAN algorithm produces the same clusters

New Deleted CWs

New Clustering

Find affected points with last result of incremental bulk DBSCAN

Clustering with DBSCAN

Apply Incremental Bulk DBSCAN Just to Affected Points

Partial Clusters

Intersected Points

Fig. 8 An example of two-layered background modeling algorithm,background is constant

as obtained by Incremental DBSCAN. One of the majoradvantages of this approach is that it allows us to see theclustering patterns of the new data along with the existingclustering patterns. Moreover, we can see the merged clus-ters as well. Incremental Bulk DBSCAN algorithm is con-siderably efficient as compared with incremental DBSCAN,when the number of the points of intersection is fewer andwhen there is little or no noise. Regarding the subject ofthis research, since the number of the points of intersec-tion is low, concerning the execution time of the simpleincremental algorithm, using Incremental Bulk DBSCANalgorithm is better [2]. Before we proceed on to provid-ing more details of the algorithm, it is necessary to givea few definitions; the summarization of this algorithm andthe pseudo code of this algorithm are shown in Figs.8and 9.

Definition 1 (Intersection data points) Let D & D′ be data-bases of points and p be some point such that p D′ and p D.We define p as an intersection point if there is at least one D-member object in Nε(p), Nε(p) is the subset of D containedin the neighborhood of a given radius (ε) of p [2].

Definition 2 (Affected data points) Let D be a database ofpoints and p be some point (either in D or not in D). Wedefine the set of points in D affected by the insertion of p as[2]:

AffectedD(p) = Nε(p) ∪ {q|∃o ∈ Nε(p)∧q <D∪{p} o

}(7)

3.5 Special-temporal information extraction

The output of incremental clustering of DelCWs is depen-dent on each object. Clusters obtained by this algorithm areassessed by the amount of growth in each step, in case nogrowth is noticed, the extraction of spatio-temporal informa-tion of the related object is done using a clustering. Next,the cluster is deleted and participated in the next step of

123

SIViP

Fig. 9 Incremental bulkDBSCAN algorithm [2] Algorithm: IncrementalBulkDBSCAN

Input: oldClustering, newClustering, point, Eps, MinPts

Output: type, clusterId

1: Get the Neighborhood Points of the Point in both the clustering

2: For Every Neighborhood Point of the Given Point

3: If the Neighborhood Point is a core Point

4: If the Neighborhood Point was not a Core Point earlier

5: If the Neighborhood Point belongs to a Cluster

6: Add the Neighborhood point to a List 'Change' to process Later

7: Else

8: Mark to Change the Cluster Later

9: Get the Neighborhood Points of the Neighborhood Point and add them into a listNp

10: For every Point in the List Np

11: If the Point is Noise

12: Mark to Change Later

13: If the Point Belongs to a Cluster

14: Add the point to the List 'Change' to process later

15: Else

16: Add the Neighborhood Point to the List 'Change' to process later

17: IF No New Core Points

18: Assign the Cluster ID as Noise

19: Else

20: IF 'Change' List has No elements Then

21: Assign all Marked Points to a New Cluster

22: Else If Change List has only one Element

23: Add the Point to the 'Change' Cluster

24: Else

25: Update All the CusterID of the Points in the 'Change' List

Fig. 10 Proposed algorithm forextracted trajectory Algorithm: ExtractTrajectoryOfObjects

Input:CwsOfCluster Output: Trajectory 1: minTime = min{pi}

2: maxTime = max{qi}

3:For L = minTime to maxTimerepeat following operations 4: Find all CW in CwsOfCluster that p < L and q > L 5: calculate Center of these CWs by mean of location information, and save it as location of object in L.

the incremental clustering. Figure 10 illustrates the stepsrequired to extract the spatio-temporal information of theclusters. Figures 11 and 12 show a sample of the extractedtrajectory.

4 Experimental Result

In this section, we evaluate the proposed method as well asthe used dataset; then methods will be compared,evaluation

123

SIViP

Fig. 11 A sample of extractedtrajectory from PETS 2001.Trajectories of a man and a carare shown with white lines,object are marked with theyellow circles (color figureonline)

Fig. 12 A sample of extracted trajectory from CAVIAR. The trajectoryof a man is shown with yellow line and the object is marked with thewhite circle (color figure online)

criteria and methods will be discussed, and finally the resultswill be presented.

4.1 Datasets

PETS is a standard dataset for performance evaluation oftracking and surveillance systems. This dataset has been usedin several researches [17–25]. To evaluate the accuracy ofour proposed method, the results of the present paper arecompared with the right trajectories. So, we select CAVIARand PETS 2001 from PETS collection dataset, because theentire frame of these datasets were labeled manually and isavailable in XML format. CAVIAR was used in [22,23] andPETS 2001 was used in [24,25].

4.2 Comparing the methods

To evaluate the proposed tracking method, we compared ourmethod with another recent method [14] shown in figureswith APT and a base method [15] shown in figures with MOT.Our method is shown in figures with ObjTrackingWithCBMlabel.

We compare our proposed method with [14], since it is anovel tracker with appropriate results and uses Bayesian per-pixel classification to segment an image into foreground andbackground objects based on observations of object appear-ances and motions in real-time. Their approach is pixel-based(like our approach). Hence, its execution time is close to thatof our proposed method.

Dedeoglu [15] adopt a general approach to tracking, inwhich they first use a foreground and background detectionmethod to determine the foreground region in each frame,and then use a matching method to track the objects. Thisapproach was selected because we wanted to test the effi-ciency of our proposed method compared with the basicmethod. To achieve this, in the first step of this basicapproach, we use the two-layered CB method for foregroundand background detection; so the major difference betweenour method and this approach lies in the second step. There-fore, the comparison proved to be meaningful.

4.3 Evaluation criteria and test method

We used precision, recall and F1-measure in order to evaluateand compare our proposed method. Moreover, we comparedthe execution time of our proposed method with the rivalmethods using profiler of Matlab 2008.

As the execution time of object detection and trackingmethods depends on video content, the method is appliedto multiple sequence frames in video data and finally theaverage result is reported. So in this paper, we randomlyselected a sequence of the neighboring frames and appliedthe methods 10 times and finally reported the mean value[26]. In the first set of performance experiments, the methodswere evaluated in terms of precision, recall, F1-measure.

For Incremental DBSCAN and DBSCAN algorithms, Weset the values of SEps, PEps, QEps, FEps, minPts to 3, 2, 2,2, 7 for PETS 2001 dataset and to 6, 5, 5, 5, 30 for CAVIARdataset. For this parameter setting, we tested different valuesfor each parameter and observed the weights which were

123

SIViP

Fig. 13 Outputs of foregrounddetection algorithm [1]. a Showsthe main figure. b Is the outputof the two-layered code bookalgorithm, the parameters ofwhich are initiated with ourproposed relation. In thispicture, background pixels areshown in white color andbackground pixels in blackcolor. c Is the output of thetwo-layered code bookalgorithm, the parameters ofwhich are initiatedinappropriately. d Is the outputof the two-layered code bookalgorithm, the parameters ofwhich are initiated with largervalue than our proposed value(color figure online)

considered as a cluster in the results. If a cluster includes theCWs of a moving object, then we infer that the parametersare suitably set; if a cluster includes more than one object, weinfer that the values of the parameters should be decreased;and if a cluster includes the partial part of an object, the valuesof parameters should be increased; finally, the values whichtake the CWs of a moving object as a cluster were selected.

4.4 Results

4.4.1 Survey of the correctness of the proposedrelationships to determine the thresholds

In Fig.13, the effectiveness of the thresholds at the correctdiagnosis of the foreground is shown. VS-PETS data set,in which the players are running, is used. Using the pro-posed relationships, the value of Tadd,Tdelete, and Th for thissequence of frames is calculated as 20, 11 and 11, respec-tively. These values can correctly diagnose the foregroundshown in Fig. 13b.

If, as shown in Fig. 13c, the value of Tadd, Tdelete, and Th isinitialized as 20, 11, and 2, then the foreground of the currentframe is not diagnosed correctly and instead, the foregroundin the previous frame is detected as the foreground in thecurrent frame; the accuracy is not good because the valueof Tdelete is low. Consequently, the CWs which contain thebackground information and are not observed during the twoframes are removed, and the background model is quicklychanged.

If, as shown in the Fig. 13d, the values of Tadd, Tdelete andTh are initialized as 25, 30, 30 and 30, 11,11 and 20, 2, 11,then the foreground is diagnosed correctly.

It can be concluded that in videos with constant back-ground, Th and Tadd do not contribute to the foreground detec-tion. If Tdelete value is less than the value obtained using theproposed relationship, the results will not be acceptable andfor greater values, the correct results should be obtained.

4.4.2 Evaluation of the proposed tracking method

The results of a comparison between the proposed methodand other methods in terms of accuracy and execution timeare presented in this section; the comparison was first madewith PETS 2001 database.

As shown in Fig. 14, our proposed algorithm has obtaineda better result in comparison with other algorithms in terms

Fig. 14 The comparison of the precision of the proposed algorithmwith the two other object-tracking algorithms on PETS 2001 database

123

SIViP

Fig. 15 Comparison of the execution time between the proposed algo-rithm and other algorithms on PETS 2001 for 100 frames. The promi-nence of the proposed algorithm is demonstrated

Fig. 16 Comparison of the precision of the proposed algorithm withthe two other object-tracking algorithms on CAVIAR data

of precision, recall, and F1-measure criterion; however, byremoving the intersected trajectories for all methods, itsadvantage becomes less salient.

In the following experiment, we compared the execu-tion time of these algorithms. The results showed that ourproposed algorithm had shorter execution time and betteraccuracy. The speed of the proposed method was also goodthanks to the two-layered CB model data structure as theinput features for clustering as well as the DBSCAN andthe bulk DBSCAN clustering algorithms that all have goodperformance in a large datasets. The precision of the pro-posed method was also efficient because of suitable dis-tance criterion in clustering and appropriate features existingin CWs.

As Fig. 15 shows, we indicated that in addition to hav-ing much lower execution time than the other two meth-ods, our proposed method was also better in terms of pre-cision. The suitable speed of the algorithm can be attributedfirstly to the incremental bulk DBSCAN and DBSCAN algo-rithms that have an acceptable capability in clustering bulkydata, and secondly to using the two-layered CB model. Theenhanced precision of the proposed algorithm can also beattributed to the suitable distance criterion used in the cluster-ing and application of the suitable features hidden in the CWstructures.

As Fig. 16 shows, in terms of three factors, i.e., precision,re-call, and combined criteria, the proposed algorithm is bet-

Fig. 17 Comparison of the execution time of the proposed algorithmwith the two other object-tracking algorithms on CAVIAR data for 100frames. The prominence of the proposed algorithm is demonstrated

ter than the other two algorithms. Of course, due to ignoringthe occlusion paths, this prominence is not very significant.

In Fig. 17, the results obtained from CAVIAR data alsoindicated the prominence of our proposed algorithm com-pared with the two other algorithms in terms of executiontime and accuracy. The results obtained from CAVIAR datawere weaker than those from PETS 2001 data. This couldbe attributed to the difference of the data context which wascompared in terms of the number of objects and the com-plexity of the trajectory. The best result for PETS is 92.44and for CAVIAR is 90.07 in F1-measure.

5 Conclusion

We proposed an efficient algorithm for detection and track-ing of moving objects using the correct combination of com-monly popular code book algorithms, DBSCAN, and incre-mental DBSCAN.

The main advantage of this technique is that it uses thewasted information in the background estimation step todetect and track the foreground. In addition to simplifyingand reducing the steps, it uses the data and the results pro-duced by the CB model to provide an access to rich infor-mation of moving objects and can accurately extract trajec-tories

Among other innovations of this research, we mentionusing DBSCAN clustering method which is one of themost efficient methods for clustering large data, as well asapplying the incremental bulk DBSCAN clustering whichincreases the already existing clusters by adding new clustersinstead of adding members and thus significantly increase thespeed.

Plus, some formula for the automatic setting of thevalue of the threshold in the two-layered code book is pro-posed.

The results show that, compared with other availablealgorithms, our proposed algorithm has an acceptable per-formance in terms of both accuracy and speed. In futureresearch, we will develop our method to check whether thetwo objects have overlapping paths and we will also change

123

SIViP

the structure of CW so that the information it contains willbe more useful and rich for object tracking.

References

1. Sigari, M.H., Fathy, M.: Real-time background model-ing/subtraction using two-layer CB model. In: Proceedingsof the International MultiConference of Engineers and ComputerScientists, 200, pp. 19–21 (2008)

2. Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., Xu, X.: Incre-mental clustering for mining in a data warehousing environment.In: Proceedings of 24th VLDB Conference (1998)

3. Koohzadi, M., Reza Keyvanpour, M.: An analytical framework forevent mining in video data. Art. Intell. Rev. (2012). doi:10.1007/s10462-012-9315-5

4. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual sur-veillance of object motion and behaviors. IEEE Trans. Syst. Man.Cybern. 34, 334–352 (2004)

5. Moeslund, T.B., Granum, E.: A survey of computer vision-basedhuman motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)

6. Moeslund, T.B., Hilton, A., Kruger, V.: A survey of advances invision-based human motion capture and analysis. Comput. Vis.Image Underst. 104(2), 90–126 (2006)

7. Valera, M., Velastin, S.: Intelligent distributed surveillance sys-tems: a review. Proc. IEEE Vis. Image Signal Process. 152(2),192–204 (2005)

8. Alahi, A., Vandergheynst, P., Bierlaire, M., Kunt, M.: Cascade ofdescriptors to detect and track objects across any network of cam-eras. Comput. Vis. Image Underst. 114(6), 624–640 (2010)

9. Leibe, B, Schindler, K, Van Gool, L.: Coupled detection and tra-jectory estimation for multi-object tracking. In: Computer Vision,IEEE 11th International Conference, pp. 1–8 (2007)

10. Wu, B., Nevatia, R.: Tracking of multiple, partially occludedhumans based on static body part detection. IEEE Comput. Soc.Conf. Comput. Vis. Pattern Recognit. 1, 951–958 (2006)

11. Zhao, T., Aggarwal, M., Kumar, R., Sawhney, H.: Real-timewide area multi-camera stereo tracking. In: Conference, ComputerVision and Pattern Recognition. IEEE Computer Society, pp. 976–983 (2005)

12. Lanz, O.: Approximate bayesian multibody tracking. IEEE Trans.Pattern Anal. Mach. Intell. 28(9), 1436–1449 (2006)

13. Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.: Real-timeforeground-background segmentation using CB model. Real-TimeImaging 11, 172–185 (2005)

14. Roth, D., Koller-Meier, E., Van Gool, L.: Multi-object trackingevaluated on sparse events. Multimedia Tools Appl. Arch. 50(1),29–47 (2010)

15. Dedeoglu, Y.: Moving object detection, tracking and classificationfor smart video surveillance. Master Thesis, Bilkent University,Ankara (2004)

16. Xie, L., Yan, R.: Extracting semantics from multimedia content:challenges and solutions. In: Divakaran, A. (ed.) Multimedia Con-tent Analysis: Theory and Applications, pp. 1–31. Springer, NewYork (2008)

17. Kim, K., Chalidabhongse, T.H., Harwood, D., Davis. L.: Real-timeforeground-background segmentation using CB model. Real-TimeImaging 11(3), 172–185 (2005)

18. Sigari, M.H., Fathy, M.: Real-time background model-ing/subtraction using two-layer CB model. In: Proceedingsof the International MultiConference of Engineers and ComputerScientists (2008)

19. Rasid, L.N., Suandi, S.A.: Versatile object tracking standard data-base for security surveillance. In: 10th International Conferenceon Information Science, Signal Processing and Their Applications(2010)

20. Hakeem, A., Shah, M.: Learning, detection and representation ofmulti-agent events in videos. Art. Intell. 171(8–9), 586–605 (2007)

21. Zhang, C., Chen, X., Zhou, L., Chen, W.-B.: Semantic retrieval ofevents from indoor surveillance video databases. Pattern Recognit.Lett. 30(12), 1067–1076 (2009)

22. Khalid, S., Naftel, A.: Classifying spatiotemporal object trajecto-ries using unsupervised learning of basis function coefficients. In:Proceedings of the Third ACM International Workshop on VideoSurveillance and Sensor Networks (2005)

23. Khalid, S.: Motion-based behavior learning, profiling and classifi-cation in the presence of anomalies. Pattern Recognit. 43(1), 173–186 (2010)

24. Wijnhoven, R.G.J., de With, P.H.N.: Experiments with patch-based object classification. In: IEEE International Conference onAdvanced Video and Signal based Surveillance (2007)

25. Somasundaram, G.: Object classification in traffic scenes usingmultiple spatio-temporal features. In: IEEE International 20thMediterranean Conference Control and Automation (MED)(2012)

26. Liu, J., Tong, X., Li, W., Wang, T., Zhang, Y., Wang, H.: Automaticplayer detection, labeling and tracking in broadcast soccer video.Pattern Recognit. 30(2), 103–113 (2009)

123

http://dx.doi.org/10.1007/s10462-012-9315-5

http://dx.doi.org/10.1007/s10462-012-9315-5

Documents

OTWC: an efficient object-tracking method