4
ISPR CS 2006 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS2006) 20 Db Yonago Convention Center, Tottori, Japan An Efficient Implementation of the Nearest Neighbor Based Visual Objects Tracking Kairoek Choeychuent, Pinit Kumhomt and Kosin Chamnongthait tComputer Vision Lab, Department of Electronics and Telecommunication Engineering King Mongmut's University of Technology Thonburi, Bangkok 10140 Thailand Tel: +66-0-2470-9064, Fax: +66-0-2427-9636 E-mail: [email protected] Abstract- An independent visual objects tracking is less similarity computation to track human after separation or re- reliable than the data association of visual objects tracking. This appearance (occlusion). Senior et. al. [3] uses RGB color mo- paper describes a tracking method based on the nearest neighbor (NN) data association, which serves lower computational than do the multiple hypothesis tracking (MHT) or the Joint probabilistic U data association filter (JPDAF) but gives low reliability, if the number of targets is increased. This reliability can be increased by selecting appropriate visual object model. To obtain low com- putation while capable of handling non-rigid object, we propose an object model which combines the threshold of accumulated object region and the object bounding box. The elements of the () b association matrix are the distance function that is proposed as a (c) mixture of object models of distance function. The combinations of object models of distance function are important mechanism Fig. 1. The top row shows some erroneous region. The bottom row for determining appropriate state of object correspondence shows raw image. (a): the split region, (b): the distorted region and which can be divided into six groups: updated track, missing (c): the spurious object. track, newly track, grouped track, merged track and complex del with probabilistic mask as temporal textural template. track. The missing track is solved by the track life time criterion while~~~~~~ ~ th gruig th megdadtecope .rc r Finally, Yilmaz et. al. [4] proposes image energy and shape resolved by using the proposed NN algorithm again. The energy with online updating for contour energy function to experimental results are correctly shown on various situations of handling occlusion problem. These methods can be exploited correspondence problem from surveillance image sequences. under slowly changing of object shape and require perfect foreground region. Moreover these methods are unclear in the I. INTRODUCTION case of the ambiguous correspondence such as the split region or the missing region. In this paper, we consider the object tracking applied to o handlea iout automate video surveillance with a fixed camera. Scene To are us earlespod, tracking processin environmnts can e both idoor andoutdoor.Althoug methods are used. In the earliest period, tracking process in environments can be botion od oor.dAltho radar applications aims to find correspondence targets from various foreground region detection methods ar prpoed th radar signal sequence that includes background and thermal detected rbjegin Soma e distorted,uspi mong oretiong ase noise. The most targets tracking methodologies are concentrat- spurious objects1.Some erroneous foreground regions are ed on data association techniques such as N, MHT and JPD- shown in Fig. 1. In addition foreground regions have to face AF that perform data filtering and managing track states that non-apriidaotio pobletmoscd s include track initiation, track update and track termination [5]. aappaoject moe.For the NN, the nearest neighbor of predicted target will be Multiple objects tracking method can be divided into two updated and Munkres' assignment algorithm is used to solve groups: the tracking as recognition and the data association local and global minimum distance function. This approach method. In the first group, the objects are represented as the serves lower computational than do the MHT and JPDAF but template [1]. The static template is defined on database. If object tanslate or rotaes thenthe bes matchedobject ill be t gives low reliability, if the number of targets iS increased. Object translates or rotates then the best matched object will be For the MHT or the JPDAF, the JPDAF is special case of the matched by resizing or rotating its template respectively. This MHT and it serves lower computational than the MHT but the method is appropriate for tracking rigid object that can be pre- initiation state of its is less reliable than the MHT. dicted in the future frame. To handle non-rigid problem, the Iat template will be adapted for all the time that is called the on img a eotie n rprcsigpoesisdt dynamic template. For the instance, Haritaoglu et. al. [2] uses reduce some image noises for the feature extraction process, second order model of bounding box to track isolated human.. ..... motion and uses temporal textural template with weighted sotecmlxyan rlibiyofhedaascain O-7803-9733-9/06/$20,O©006tzu IEEE 574

[IEEE 2006 International Symposium on Intelligent Signal Processing and Communications - Yonago, Japan (2006.12.12-2006.12.15)] 2006 International Symposium on Intelligent Signal Processing

  • Upload
    kosin

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2006 International Symposium on Intelligent Signal Processing and Communications - Yonago, Japan (2006.12.12-2006.12.15)] 2006 International Symposium on Intelligent Signal Processing

ISPRCS 2006 International Symposium on Intelligent SignalProcessing and Communication Systems (ISPACS2006)

20 D b Yonago Convention Center, Tottori, Japan

An Efficient Implementation of the Nearest NeighborBased Visual Objects TrackingKairoek Choeychuent, Pinit Kumhomt and Kosin Chamnongthait

tComputer Vision Lab, Department of Electronics and Telecommunication EngineeringKing Mongmut's University of Technology Thonburi, Bangkok 10140 Thailand

Tel: +66-0-2470-9064, Fax: +66-0-2427-9636E-mail: [email protected]

Abstract- An independent visual objects tracking is less similarity computation to track human after separation or re-reliable than the data association of visual objects tracking. This appearance (occlusion). Senior et. al. [3] uses RGB color mo-paper describes a tracking method based on the nearest neighbor(NN) data association, which serves lower computational than dothe multiple hypothesis tracking (MHT) or the Joint probabilistic Udata association filter (JPDAF) but gives low reliability, if thenumber of targets is increased. This reliability can be increasedby selecting appropriate visual object model. To obtain low com-putation while capable of handling non-rigid object, we proposean object model which combines the threshold of accumulatedobject region and the object bounding box. The elements of the

() b

association matrix are the distance function that is proposed as a (c)mixture of object models of distance function. The combinationsof object models of distance function are important mechanism Fig. 1. The top row shows some erroneous region. The bottom rowfor determining appropriate state of object correspondence shows raw image. (a): the split region, (b): the distorted region andwhich can be divided into six groups: updated track, missing (c): the spurious object.track, newly track, grouped track, merged track and complex del with probabilistic mask as temporal textural template.track. The missing track is solved by the track life time criterion

while~~~~~~ ~thgruig th megdadtecope .rc r Finally, Yilmaz et. al. [4] proposes image energy and shaperesolved by using the proposed NN algorithm again. The energy with online updating for contour energy function toexperimental results are correctly shown on various situations of handling occlusion problem. These methods can be exploitedcorrespondence problem from surveillance image sequences. under slowly changing of object shape and require perfect

foreground region. Moreover these methods are unclear in theI. INTRODUCTION case of the ambiguous correspondence such as the split region

or the missing region.In this paper, we consider the object tracking applied to o handlea iout

automate video surveillance with a fixed camera. Scene To are us earlespod, tracking processinenvironmntscan e both idoor andoutdoor.Althoug

methods are used. In the earliest period, tracking process inenvironments can be botion od oor.dAltho radar applications aims to find correspondence targets fromvarious foreground region detection methods arprpoed th radar signal sequence that includes background and thermaldetected rbjegin Soma e distorted,uspi mong oretiong ase noise. The most targets tracking methodologies are concentrat-spurious objects1.Some erroneous foreground regions are ed on data association techniques such as N, MHT and JPD-shown in Fig. 1. In addition foreground regions have to face AF that perform data filtering and managing track states that

non-apriidaotio pobletmoscd s include track initiation, track update and track termination [5].aappaoject moe.For the NN, the nearest neighbor of predicted target will be

Multiple objects tracking method can be divided into two updated and Munkres' assignment algorithm is used to solvegroups: the tracking as recognition and the data association local and global minimum distance function. This approachmethod. In the first group, the objects are represented as the serves lower computational than do the MHT and JPDAF buttemplate [1]. The static template is defined on database. Ifobject tanslate or rotaes thenthe bes matchedobject ill be t gives low reliability, if the number of targets iS increased.Object translates or rotates then the best matched object will be For the MHT or the JPDAF, the JPDAF is special case of thematched by resizing or rotating its template respectively. This MHT and it serves lower computational than the MHT but themethod is appropriate for tracking rigid object that can be pre- initiation state of its is less reliable than the MHT.dicted in the future frame. To handle non-rigid problem, the Iattemplate will be adapted for all the time that is called the on img a eotie n rprcsigpoesisdtdynamic template. For the instance, Haritaoglu et. al. [2] uses reduce some image noises for the feature extraction process,second order model of bounding box to track isolated human.. .....motion and uses temporal textural template with weighted sotecmlxyan rlibiyofhedaascain

O-7803-9733-9/06/$20,O©006tzu IEEE 574

Page 2: [IEEE 2006 International Symposium on Intelligent Signal Processing and Communications - Yonago, Japan (2006.12.12-2006.12.15)] 2006 International Symposium on Intelligent Signal Processing

techniques for tracking process can be improved by selecting maximum of different pixel intensity at pixel x respectively.appropriate object model. These values are updated every N frames to compensate

For the instance, Cox et. al. [6] resolves ambiguity illumination changing.correspondence by using Multiple Hypothesis Tracking However, unexpected noise still remains. Gaussian noise is(MHT). Because hypothesis branches rapid growth, Cox et. al. removed by low pass filter, some small object is removed byuses the Mahalanobis distance and takes cross correlation morphological operator and connected component analysis.coefficient of corners image intensity to reduce number of hy- Finally, spurious object is managed by resolving ambiguouspothesis. This method doesn't organize corner points to form correspondence process.object model, when the regions of two or more objects are B The Feature Extraction Process (Object Modelingoverlapped and the dynamic corner intensity may be caused by process)false detection results and difficult matching feature points. Tocess)

Yang et. al. [7] detects merged region (occlusion To handle non-rigid object, this paper uses a mixture ofsituation) using data association technique. System will object model that includes the threshold of accumulated objectglobally track merged region as single blob. When merged region and the object bounding box.region splits, color distance matrix is computed to terminate a) The Threshold ofAccumulated Object Regionglobal bounding box and continuously tracks isolated objects.The addition object models are required for this method that e e

of accumulated object region, A(x) iS defin-can't be used under silhouette environments such as self r t ceclipse, at dusk etc. F1, (I - w)x A,1 (x) + w x F, (x) > Thl

To handling non rigidity of object and noisy images, A, (x)=> (2)Bremond et. al. [8] represents object as the five generic points 1O, ortherwiseon bounding box with keep its average width and height. For The recursive between revious updated the threshold ofambiguous case, Bremond et. al. defines three types of pi p' . . . ~~~~~~~~accumulated object region at pixel x, At 1 (x) and foregroundcompound target that consist of split, merge and mixed r p . 1compound target. Ambiguous targets are maintained and regon pixels inEq. xshows the trend of continual existing of

frozen~~ ~~~~~~~~~~,thi,nomto.Tepoeswl rc eprrobject region at pixel x that iS controlled by weight (w) and

frozen th inormpou taretionstheaproe wmillutra tempor.ary threshold Thl. If more w is used then At (x) will be trended totarget of compound target instead Of ambiguous target. Thismethod serves good manage tracking problem under noisy more Ft (x) and if more Thl is used then At (x) will be gottenimage. However, while ambiguous state still exists, the more covering of expected object region.maintained and frozen information may be changed. The am- b) The Object Bounding Boxbiguous state is hard eliminated and may lead to tracking error. The object bounding box includes the four corner point onSo we propose locally tracking instead of globally tracking. the bounding box of object region and the center point of its.

The remainder of the paper is organized as follows: The This model is used to define newly detected object regionproposed approach is described in section 2. In section 3 (observation) but the best matching ofthe points and averageshows the experimental results. Finally, the conclusion is their width and height is updated to correspondence object.discussed in section 4. This object model is related with object shape that may be

called the dynamic template and serves low computation also.II. THE PROPOSED APPROACH In addition, it can handle the case that is hard to identify object

The overview of steady state tracking process includes the with alone color feature and allows us to track partial points offoreground region detection, the feature extraction, the the object bounding box under ambiguous correspondenceassociation matrix forming and the resolving ambiguous track.correspondence. C. The prediction method

A. The Foreground Region Detection Process The motion parameters are modeled pass predictionWe perform foreground regions detection by using method. The prediction method can be separated into two

background subtraction with the adaptive background model. groups: Bayes estimation and fitting technique. BayesOur background model uses bi-model of Gaussian [4]. The estimation is performed on parameter domain such as Kalmanforeground region pixels, F(x) are computed as follows: filter etc. The estimated value of Kalman filter slowly changes

which is appropriate for smooth motion or temporality of

f(I,(x)- min(x)) < kD,l or sudden changing. In the recent year, the MCMC particle filterF(x)= ' (max(x)- I,(x)) < kD (1) is presented by [9] which proposed an efficient prediction

I otherwise method but if system uses multiple object models then thekind of prediction will be complicated. Another approach,

The bi-model of Gaussian allows each of pixels l(x) to fitting technique performs on measurement domain such ashave one or two mean of intensity. In the case of two mean of least square estimation etc. The estimated function of leastintensity, min(x), max(x) and Dare minimum value of I1st square estimation (LS estimation) is computed to balance sum

Gaussian, maximum value of 2n Gaussian and mean of sur ro o aast

575

Page 3: [IEEE 2006 International Symposium on Intelligent Signal Processing and Communications - Yonago, Japan (2006.12.12-2006.12.15)] 2006 International Symposium on Intelligent Signal Processing

In practical, the detected object shape may not be is existed in the periods of N frames then the system willpermanent distortion but it always acts as temporality of delete the missing track and the 2nd Association Matrixsudden changing of object distortion. So this paper uses eliminates this row. Another case, this paper uses themultiple values of the prediction method on a 3x3 window that contextual information on static environment [3] for guidingis based on the Kalman filtering and the best matching of tracking process to continuously track object.multiple values of the prediction method is selected. A - Newly Track: if score of MarginRow[Observation (j)] = 0.prediction method is separately computed for x and y This observation become to new track for the next frame andcoordinates of second order motion of object. the column is eliminated from the 2nd Association matrix.

- Ambiguous Correspondence: if score of the 2nd Associa-D. The Distance Function and The Association Matrix tion matrix[Track(i),Observation()]=I but MarginColumn-

The association matrix is used to match updated existing [Track(i)] or MarginRow[Observation (j)] . 1. Now, the 2ndtracks (objects) with newly detected object at current frame. Association matrix remains only ambiguous track that will beMoreover, this matrix can be used to detect ambiguous resolved for the next time.correspondence. In this matrix, the rows (r) represent updatedexisting tracks and the columns (c) represent newly detected E. The ResolvingAmbiguous Correspondenceobject. The elements of this matrix are distance function The ambiguous correspondence tracks can be dividedwhich is defined as follow: into three groups as follow:

Distance(r,c)=(I Area gating) Norm.dis. 3- Grouped Track: The value of MarginColumn[track (i)]Distance(rc) Area gating x Norm .dis. (3)can tell us about number of newly detected object that

Where correspondence with track (i). With the advantage of distanceArea gating = Track.rea(r) n Observationarea (c) (4) function and appropriate large threshold, if newly detected

min(Trackarea (r), Observationrea(C)) object far from predicted correspondence track enough, then(x -X )2 (y - y two or more than two newly detected object region will be

Norm.dis. = 2 + 2 (5) updated as new tracks. In this case, newly detected objects are71ax 2Y interpreted as the parts of predicted track. All newly detected

objects are grouped. The track(i) will be updated withThe first terms of Eq. (3) is used to match size of bounding box of current predicted track (the best matching of

predicted and newly detected object. For extreme case, if the corner point, average width and average height). The 2ndprediction of updated track area (Trackarea) and newly detected Association matrix is modified by eliminated track number iobject area (Observationarea) are fully matched then the and all correspondence newly detected objects.distance will be zero. For opposite extreme case, if Trackarea - Merged Track: The value of MarginRow[Observationdoesn't intersect Observationarea then only the Normalize (j)] can tell us about number of track inside merged region.distance will be used. In the Norm. Dis., Xp, XO and G,2 The resolving ambiguous correspondence is performed asrepresent predicted X coordinate, newly detected X coordinate follow step:and variance of the prediction error respectively. System Step 1: The calculated distance functions of merged trackselects best match of feature points on bounding box from in the first Association matrix are selected follow to theNorm. dis. This distance function allows tracking object under remaining elements of 2nd Association matrix and define it asnoisy images (especially split region case). This paper the temporary association matrix. The global bounding box isdetermines the threshold of distance function Th2=3.0. modeled for merged region.

The next step, each of elements of association matrix is Step 2: Find (track (i), observation (j)) which let thecompared with the Th2. The 2nd association matrix keeps global minimum of the distance function of temporarycomparing results. If distance function is less than the Th2 association matrix. This is the best matching between track (i)then comparing result will be set as one and zero for and observation (j). The track(i) will be updated withotherwise. The margin column and row is sum score of all bounding box of itself except the reference corner point thatcolumns for each row and all rows for each column be updated with the reference corner point of the bestrespectively. matching of observation ().

The types of matching results can be separated into four Step 3: The temporary association matrix is modified bytypes as follow: eliminates track number i and the observation (j) will be cut

- Updated track: if the score of the 2nd Association matrix- with the bounding box of track(i).[Track(i),Observation()]=MarginColumn[Track(i)]=MarginR Step 4: If the temporary association matrix is empty thenow[Observation(j)]=1. The 2ndAssociation matrix is modified the resolving of merged track will be stopped, otherwiseby eliminates track number i and newly detected object repeat Step 1.number j. - Complex Track: if many tracks correspond with many

- Missing Track: if score of MarginColumn[Track (i)] = 0. observation(j) then these tracks will be defined as the complexIn this case, the system can be interpreted as three events that track. All observation(j) will be defined as a single observa-include the spurious object, occluded by static environments tion and managed with the strategy of the Merged Track.or appearing outside camera view. In the cases of the spuriousobject and appearing outside camera view, if the missing track

576

Page 4: [IEEE 2006 International Symposium on Intelligent Signal Processing and Communications - Yonago, Japan (2006.12.12-2006.12.15)] 2006 International Symposium on Intelligent Signal Processing

III. EXPERIMENTAL RESULTS background model in foreground region detection or adding of.. .. ~~~~~~~~~~colordistribution in object modeling etc.The reliability and validity of the proposed approach are

tested on Pentium III 1.0 GHz with various pattern ofambiguous correspondence. The image sequences are capturedfrom NVC TKI070E at 5 fps., 416x312 pixels and convertedto 256 levels of gray scale for sequence number 1. A sequencenumber2 is the Meet-Split-3rdGuy-jpg of CAVIAR dataset.

Tracking results of sequence number 1 are shown inFig.2. A big image shows overview of tracking results. Insidethe big image is divided as two rows where the top row showstracking result and the bottom row shows correspondence #227 #240 #255 #275object region of frame number 59, 61 and 62. Frame number59 shows updated track state of T3 and beginning of mergedtrack state of TI and T5. Frame number 61 and 62 showmerged track state of TI and T5 while the state of T3 isgrouped track. This sequence includes 80 frames of whichthirteen times are of grouped state and two times are ofmerged state.

#319 #383 #458 #500

Fig. 3. Tracking results of Meet-Split-3rdGuy of CAVIAR dataset.

REFERENCES#59 #61 1 #62[1] D. Koller, K. Donilidis, and H. Nagel, "Model-Based Object

Tracking in Monocular Image Sequence of Road Trafficscenes,"gt. J. Computer Vision, Vol. 10, No. 3, pp. 257-281,1993

[2] I. Haritaogru, D. Harwood, and L. Davis, "W4: Real-TimeFig. 2. Tracking results of three objects: TI and T5 represent boys on Surveillance of people and their activities," IEEE Transactions

bicycle. T3 represents the boys' mother. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp.809-830, 2000.

Tracking results of sequence number2 are show in Fig 3 [3] A. Senior, A. Hampapur, Y. Tian, L. Brown, S. Pankanti, and R.The two first rows (#227, #240, 255 and #275) show Ti and Bolle, "Apperance Models for Occlusion Handling," ProceedingT2 that split and merge during the whole sequences. T5 is 2nd IEEE International Workshop on Performance Evaluation offormed in the third and fourth rows (#319). Frame number 458 Tracking and Surveillance System, 2001.shows incorrect result because TI corresponds with a spurious [4] A. Yilmaz, X. Li, and M. Shah, "Contour-Based Objectobject that is generated from foreground region detection. Tracking with Occlusion Handling in Video Acquired Using

Mobile Cameras," IEEE Transaction on Pattern Analysis and

IV. CONCLUSION Machine Intelligence, Vol. 26, No. 11, pp. 1531-1536, 2004.[5] S. Blackman, "Multiple-Target Tracking with Radar Applica-

This paper proposes a method of visual objects tracking tion," Artech House, Inc., 1986.that is based on nearest neighbor data association for [6] I. Cox, and S. Hingorani, "An Efficient Implementation ofautomatic video surveillance. The proposed method attempts Reid's Multiple Hypothesis Tracking Algorithm and Itsto resolve the ambiguous correspondence by using low Evaluation for Purpose of Visual tracking," IEEE Transaction

On Pattern Analysis and Machine intelligence, Vol. 18, No. 2,computation of object models and definig appropriate tracks pp. 138-150, 1996.state from distance function of a mixture of object models. [7] T. Yang,S. Li,Q. Pan, and J. Li, "Real-time Multiple Objects

The proposed method can satisfactorily deal with real- Tracking with Occlusion Handling in Dynamic scenes," IEEEtime condition, noisy images, imperfection of foreground International conference on Computer Vision and Patternregion detection and occlusion handling. The experimental Recognition, Vol. 1, pp. 970-975, 2005.results are correctly shown. However, tracking result may be [8] F. Bremond, and T. Thonnat, "Tracking Multiple Nonrigidfailed on situation of uncertainty of object shape that affect to Object in Video Sequences," IEEE Trans. on Circuit and Systemreliability of prediction method. If prediction method lacks for Video Technology, Vol. 8, No. 5, pp. 585-591, 1998.

reliability.then.resolving ambiguous correspondence will to. [9] Z. Khan, T. Balch, and F. Dellaert, "MCMC-Based Particlereliabilitv then res,olvinpambipmou-, corre-snondence Will too. Filter for Tracking a Variable Number of Interacting Targets,"Finally, with proposed method, tracking performance can be IEETas nPtenAayi adMcieItliec,Vlimproved by various techniques and will be dealt for future 27, No. 11, November 2005.work. For the example, the error on frame number 458 of Fig.3 can be solved by feedback of spurious region to update

577