[IEEE 2009 IEEE-RIVF International Conference on Computing and Communication Technologies - Danang City, Viet Nam (2009.07.13-2009.07.17)] 2009 IEEE-RIVF International Conference on

Adaptive Hybrid Mean Shift and Particle Filter

Phong Le, Duong Anh Duc, Vu Hai Quan Faculty of Information Technology Ho Chi Minh University of Science

Ho Chi Minh city, Vietnam {lphong, daduc, vhquan}@fit.hcmuns.edu.vn

Nam Trung Pham INRIA/IRISA

Campus de Beaulieu Rennes, France [email protected]

Abstract— The changing of dynamic models in object tracking can cause high errors in state estimation algorithms. In this paper, we propose a method, Adaptive Hybrid Mean Shift and Particle Filter (AHMSPF), to solve this problem. AHMSPF consists of three stages. First, the mean shift algorithm is employed to search an object candidate near the target state. Then, if this candidate is good enough, it will be used to adapt the particle filter parameters. Finally, the particle filter will estimate the target state based on these new parameters. Experimental results shown that our method has a better performance than the traditional particle filter.

Keywords- object tracking, particle filter, mean shift.

I. INTRODUCTION Object tracking is an important part in many computer

vision systems such as surveillance, smart rooms and navigation. One of the main objectives of object tracking is to track trajectories of the target.

Among tracking methods, there are two successful approaches to robust tracking which are based on particle filtering and variational methods [1]. Particle filer [2] is a parametric method which performs a random search guided by a stochastic dynamic model to obtain an estimate of the posterior distribution describing the configuration of the object. On the other hand, localizing an object can be based on minimizing a cost function. This minimum can be found by using variational methods. Mean shift [3], a typical and popular variational method, is a robust non-parametric method for climbing density gradients to find the peak of probability distributions. The search paradigms differ in these two methods. The former is stochastic and model-driven while the latter is deterministic and data-driven.

Both methods have their respective strengths and weaknesses. The most powerful strength of the particle filter is its ability of solving non-linear and non-Gaussian state estimation problems under weak assumptions [2]. However, the performance of particle filter significantly depends on the dynamic model and the number of particles. If the dynamic model is not probable, a huge number of particles are required. Moreover, the number of particles needed increases exponentially with the dimensionality of the state space. Both lead to high computational demands. On the other hand, the mean shift is a low complexity algorithm, which is independent from the dynamic model of the target. Unfortunately, this method may converge to local maximum.

To overcome weaknesses of particle filter, Shan et al. proposed a mean shift embedded particle filter (MSEPF) [4]. In this method, after particle filter spreads particles, mean shift is applied to each particle to move it toward local maximum. This method, however, does not adapt the dynamic model of the target. Moreover, because mean shift guides the way the particles move, the accuracy of the estimate heavily depends on the effectiveness of mean shift.

Maggio proposed a hybrid particle filter and mean shift tracker (TSHT) [5]. The main idea is similar to [4] except that the dynamic model is updated after each iteration by the variability of the target in the previous frames. Unfortunately, this adaptation may be a drawback. At the time the covariance of the process noise is small, if the velocity of the target abrupt increases significantly, the track can be lost.

In this paper, we proposed an adaptive hybrid mean shift and particle filter (AHMSPF) which can overcome the weaknesses of the two methods MSEPF and TSHT. Figure 1 shows our approach overview. The rest of the paper is organized as follows. Section II describes the dynamic model and observation model. Section III describes the proposed tracker. Section IV shows experimental results. In section V we draw conclusions.

II. DYNAMIC MODEL AND OBSERVATION MODEL

A. Dynamic model We model the target as an ellipse and the target state

represents this ellipse by

( , , )Tx y h=s (1) where (x,y) is the center of the ellipse, h is the ratio of the size of the ellipse related to the size of the target template.

We employ a constant velocity dynamic model

1 1 1t t t t− − −= + +s s u w (2) where ut-1 is the control vector, and wt-1 ~ N(0,Qt-1) is the process noise. ut-1 and Qt-1 are adapted during the tracking process.

B. Observation model We represent the target template and object candidates as

color histograms as in [3] and [6]. Each color histogram is calculated in the RGB space using 10×10×10 bins.

978-1-4244-4568-4/09/$25.00 ©2009 Crown1

Figure 1. The AHMDPF diagram

Let us define the normalized color histogram distribution of the template by q={qu}1,…,m, where m is the number of bins. Each candidate object is represented by a normalized color histogram p(y)={pu(y)}1,…,m at location y and calculated as

[ ]1

( ) ( )u ii

p C b uη

δ=

= −∑y x (3)

where {xi}i=1,…,η are η pixels in the candidate region, b(xi) associates the pixel xi to the histogram bin, δ[.] is the Kronecker delta function, and C is a normalization factor. The color histogram distribution of the template is calculated as the same (3). The similarity of two histograms is measured by the Bhattacharyya distance as

[ ( ), ] 1 [ ( ), ]d p q p qρ= −y y (4) where

1

[ ( ), ] ( ).m

u uu

p q p qρ=

=∑y y (5)

is the Bhattacharyya coefficient.

Now, given a color image I and an object candidate ( )i

ts , the likelihood is computed as a Gaussian distribution

2

( )

22

1( ) exp

22

i

t

dp I

σπσ= −

⎧ ⎫⎨ ⎬⎩ ⎭

s (6)

III. TRACKER

A. Mean shift Comaniciu introduced the mean shift color tracking in

[3]. This method is a local minimum optimization tracking

method. The main idea is to minimize the distance in (4) via an iterative process. Let assume that the previous location of the target is y0. Each location xi in the candidate region will be associated with a weight

[ ]1 0

( )( )

mu

i iu u

qw b u

pδ

=

= −∑ xy

(7)

The new object location is computed by

1 i i ii i

w w= ∑ ∑y x (8)

The process is repeated until there is no change in the new location.

From (8), it is possible to notice that if in the current frame the center of the target is not in the image area covered by the target model in the previous frame, the track can be lost. Therefore, mean shift is not feasible to be employed to track small nimble objects.

B. Particle filter Particle filter [2] is a technique for implementing a

recursive Bayesian filter by Monte Carlo simulation to solve the problem of object tracking in which the evolution of the target state is given by

1 1( , )t t t tf − −=s s w (9) and the measurement equation is

( , )t t t tg=z s v (10) where ft(.) and gt(.) are time-varying functions. {wt}t=1,… and {vt}t=1,… are respectively an i.i.d process noise sequence and an i.i.d measurement noise sequence. Particle filter represents the posterior pdf p(st|z1:t) by a set of random samples with associated weights and the target state is estimated based on this sample set. The process is given follows

• Let assume that at time t-1, the posterior pdf p(st-1|z1:t-1) is approximated by a set of Ns random

samples with associated weights { }( ) ( )

1 1 1,i i

t t

sN

iω− − =

s

( ) ( )

1 1: 1 1 1 11

( ) )sN

i i

t t t t ti

p ω δ− − − − −=

≈ ( −∑s z s s (11)

where δ(.) is the Dirac delta function.

• Predictor drawing a new sample set { }( )

1i

t

sN

i=s from a

given importance density q(st|st-1,zt).

• Corrector calculating new associated weights

( ) ( ) ( )

1( ) ( )

1 ( ) ( )

1

( ) ( )

( , )

i i i

t t t ti i

t t i i

t t t

p p

qω ω −

−

−

∝z s s s

s s z (12)

To avoid the problem of degeneracy [2], the resampling step is applied

estimated state

Search by mean shift

Particle filter (PF)

Good estimation?

yes no

Update PF parameters

Set default PF parameters

Video

frame

2

• Resampling resampling { }( ) ( )

1,i i

t t

sN

iω

=s to obtain

{ }( )

1,1i

t s

sN

iN

=s satisfying ( ) ( ) ( )Pr( )i j j

t t tω= =s s .

To estimate the target state, EAP-estimation is widely used.

[ ] ( )

1:1

1ˆ

sNi

t t t tis

EN =

= ≈ ∑s s z s (13)

Theoretically, if the number of particles Ns is infinite, Particle filter yields the optimized track. However, to employ Particle filter in a real-time system, it requires Ns as small as possible but still keeps the error low enough. This goal can be achieved when the models are probable.

C. Adaptive Hybrid Mean shift and Particle filter Tracking objects which can abrupt change their moving

direction and velocity significantly is a difficult task for both mean shift and particle filter. As discussed above, mean shift is not feasible because in the current frame the center of the target may be not in the image area covered by the target model in the previous frame. On the other hand, particle filter can overcome this problem but the computational load can be high because the dynamic model has to cover the worst case. Now, we propose an approach named Adaptive Hybrid Mean shift and Particle filter (AHMSPF) to solve this task effectively.

Our method consists of three stages. Let assume that at time t-1, the estimates state is 1

ˆt −s and particle filter has a set

of samples { }( ) ( )

1 1

, 1

1,i i

t t

s tN

iω− −

−

=s . Firstly, AHMSPF searches the

best object candidate by mean shift. In order to increase the effectiveness of the searching, mean shift is applied to each

hypothesis candidate in the set { }( )

1i

MS

MSN

i=s drawn from

1 1ˆ( , )t tN − −s A where At-1 is a matrix satisfying

1 11

1ˆ ˆ( ) MS

t t i t ii

MS

ndiag

n− − − −=∝ −∑A s s (14)

where nMS is the number of previous steps to estimate At-1 and ˆ

t i−s is the estimated state at time t-i. Denoting ( ) ( )ˆ meanshift( )i i

MS MS=s s and

{ }( )

,1,...,

ˆ ˆarg min ( ), ][MS

i

MS t Bhattacharyya MSi N

d p q=

=s s (15)

Secondly, if ,ˆ[ ( ), ]Bhattacharyya MS t MSd p q T≤s where TMS is a

given threshold, ,ˆ

MS ts is believed to be near the real target state, the control vector ut-1 and the covariance matrix Qt-1 of the process noise will be updated as follows

1 , 1ˆ ˆ

t MS t t− −= −u s s (16)

1 ,1

1ˆ ˆ( ) pnn

t MS t i t ii

pn

diagn− − −=

∝ −∑Q s s (17)

,s t smallN N= (18) Otherwise,

1 default 1 default , default, ,t t s tN N− −= = =u u Q Q (19) where udefault, Qdedault, Nsmall, Ndefault and npn are given, Nsmall < Ndefault, ,

ˆMS t i−s is the best object candidate that the

searching by mean shift detected at time t-i. Finally, particle filter is employed to estimate the target state as follows

• Resampling { }( ) ( )

1 1

, 1

1,i i

t t

s tN

iω− −

−

=s to obtain

{ }( )

1 ,

,

1,1i

t s t

s tN

iN− =

s .

• Implementing predictor and corrector steps to obtain

{ }( ) ( ) ,

1,i i

t t

s tN

iω

=s .

• Estimating the target state by

,

( ) ( )

1

ˆs tN

i i

t t ti

ω=

=∑s s (20)

Intuitively, if the mean shift detects a good object candidate, we can assume that the real target state is near this candidate and the particle filter will concentrate on searching this area with a small number of particles. Otherwise, the particle filter has to search the area driven by the default dynamic model. Therefore, the average number of particles will be decreased and the particle filter estimates the target state more accurate.

IV. EXPERIMENTAL RESULTS In this section, we present the experimental comparison

between our approach and other three methods: particle filter (PF), MSEPF [4], and TSHT [5]. The comparison was done on three sequences of images tennis1, hand2 and blueball3 in which targets often abrupt change their moving direction and velocity significantly. The three sequences respectively have 53, 186 and 315 frames. In the first and the second sequences, the target shape is affected by camera blur. The third sequence is corrupted by noise. The test was on CPU Centrino 1.4 Ghz, RAM 768 MB.

To be fair, the target representation and the dynamic model are the same for all methods.

default default

small default MS

MS MS PN

( ) (200, 200, 0.0169), ,

20, 150, 20,

0.6, 2, 1

diag

N N N

T n n σ

= =

= = =

= = = = 0.

Q u 0

(21)

The ground truth of positions and sizes of each target was generated by hand. Each method was initialized by using the ground truth. The results are shown in Table I and some comparisons are shown in Fig. 2, 3.

1 http://media.xiph.org/video/derf/

2 http://www.elec.qmul.ac.uk/staffinfo/andrea/HY-MP.html 3 http://casbah.ee.ic.ac.uk/~mpsha/humanoids/

3

Table I shows that the position accuracy of our method is the highest and the size accuracy is approximate the others. The average number of particles AHMSPF used is 80% less than PF, and the speed is about 1.5 times higher. To achieve this result, thanks to the effective searching by the mean shift, the particle filter in AHMSPF spent most time to search in small areas around the real target state rather than large areas driven by the default dynamic model. Therefore, the average number of particles was small but the particle filter still kept high accuracy. This led to the high implementing speed of AHMSPF. The fourth rows of Fig. 2, 3 show that AHMSPF often yielded quite good estimates while the first rows show that PF sometimes did not track the target well.

It is amazing to see that even using the same number of particles, the position accuracy of MSEPF and TSHT is lower than PF. The reason is that the target representation (see (3)) is not as good as ones used in [4] and [5]. Because every pixel inside the target ellipse is treated equally, noise pixels near the boundary of the ellipse have a high effect on the color histogram. Therefore, the mean shift forces particles coalesce into parts near the target boundary rather than the target center. This phenomenon can be seen in the second and third rows of Fig. 2, 3. For example, in Fig. 2, the estimated balls of MSEPF and TSHT tend to cover parts near the ball boundary. This is a drawback of MSEPF and TSHT. However, our method, AHMSPF, remains good performance during the tracking time. It is shown in the fourth rows of Fig. 2, 3.

V. CONCLUSIONS In this paper, we proposed a new method named

Adaptive Hybrid Mean shift and Particle filter. The main idea of our method is that an effective searching by mean shift is applied to detect a good object candidate near the target state; then, this candidate is used to adapt the particle filter parameters. Experimental results show that our method, with a much smaller number of particles, is more accurate and runs faster than others three methods, the traditional particle filter, MSEPF and TSHT. Our future work includes enhancing the sampling step in particle filter.

REFERENCES [1] J.Sullivan and J.Rittscher, “Guiding Random Particles by

Deterministic Search”, ICCV, 2001, pp.323-330. [2] S. Maskell and N.Gordon, “A Tutorial on Particle Filters for On-line

Nonlinear/Non-Gaussian Baysian Tracking”, in Proc. IEE Workshop "Target Tracking: Algorithms and Applications", Oct. 2001.

[3] D. Comaniciu, V. Ramesh, P. Meer, "Kernel based object tracking", IEEE Trans. Pattern Analysis Machine Intell., Vol. 25, No. 5, 564-575, 2003.

[4] C. Shan, Y. Wei, T. Tan, F. Ojardias, "Real Time Hand Tracking by Combining Particle Filtering and Mean Shift", in International Conference on Automatic Face and Gesture Recognition, 2004.

[5] E. Maggio and A. Cavallaro, "Hybrid particle filter and Mean Shift tracker with adaptive transition model", in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2005.

[6] K. Nummiaro, E. Koller-Meier, and L. Van Gool, “A color-based particle filter,” in Proc.of the 1st Workshop on Generative-Model-Based Vision, June 2002, pp. 53–60

Figure 2. Tracking results in tennis sequence images. The first to the fourth row respectively are the results of PF, MSEPF, TSHT and AHMSPF

(left to right: frame 7, 30, 36, 39).

Figure 3. Tracking results in hand sequence images. The first to the fourth row respectively are the results of PF, MSEPF, TSHT and AHMSPF (left

to right: frame 15, 19, 48, 52)

TABLE I. COMPARISON BETWEEN FOUR METHODS

Ns/frame APE ASE Speed (fps)

Blueball PF 150 2.79 2.25 31

MSEPF 150 3.16 1.92 28

TSHT 150 3.37 1.99 24

AHMSPF 20.8 2.52 2.26 45

Tennis PF 150 3.56 2.03 41

MSEPF 150 3.84 2.15 34

TSHT 150 4.59 2.85 30

AHMSPF 32.5 2.70 2.44 76

Hand PF 150 8.87 5.97 32

MSEPF 150 LT LT 25

TSHT 150 11.10 6.51 17

AHMSPF 28.4 8.2 5.96 40APE: Average Position Error, ASE: Average Size Error, LT: Loose Track

4

Documents

[IEEE 2009 IEEE-RIVF International Conference on Computing and Communication Technologies - Danang City, Viet Nam (2009.07.13-2009.07.17)] 2009 IEEE-RIVF International Conference on