11
. RESEARCH PAPER . SCIENCE CHINA Information Sciences November 2012 Vol. 55 No. 11: 2635–2645 doi: 10.1007/s11432-012-4701-9 c Science China Press and Springer-Verlag Berlin Heidelberg 2012 info.scichina.com www.springerlink.com A variational method for contour tracking via covariance matching WU YuWei, MA Bo & LI Pei Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing 100081, China Received February 10, 2012; accepted March 28, 2012 Abstract This paper presents a novel formulation for contour tracking. We model the second-order statistics of image regions and perform covariance matching under the variational level set framework. Specifically, covariance matrix is adopted as a visual object representation for partial differential equation (PDE) based contour tracking. Log-Euclidean calculus is used as a covariance distance metric instead of Euclidean distance which is unsuitable for measuring the similarities between covariance matrices, because the matrices typically lie on a non-Euclidean manifold. A novel image energy functional is formulated by minimizing the distance metric between the candidate object region and a given template, and maximizing the one between the background region and the template. The corresponding gradient flow is then derived according to a variational approach, enabling partial differential equations (PDEs) based contour tracking. Experiments on several challenging sequences prove the validity of the proposed method. Keywords contour tracking, covariance region descriptor, level set, Log-Euclidean Riemannian metric Citation Wu Y W, Ma B, Li P. A variational method for contour tracking via covariance matching. Sci China Inf Sci, 2012, 55: 2635–2645, doi: 10.1007/s11432-012-4701-9 1 Introduction Object tracking is a well-known topic to computer vision community, and has found applications in a wide range of areas including visual surveillance, medical imaging, human computer interaction, and autonomous vehicles, etc. Different tracking strategies have been implemented and applied to overcome the related difficulties, from Bayesian-based trackers [1,2], kernel-based methods [3,4] or online-learning- based methods [5–8] to detection-based methods [9,10]. A thorough review can be found in [11,12]. The most methods mentioned above usually model shapes of objects as simple geometric primitives, such as rectangles, ellipses, etc., hence leading to inaccurate extraction of object contour. In addition, such simple geometric shapes cannot be applied for high-level motion analysis like pose recognition. In contrast, contour-based methods aim to obtain the accurate contour of an object in each frame instead of the rough locations. One technique that ideally provides precise localization of the target and its boundary is the use of an implicit contour, or level set to represent the boundary of the object [13–16]. Recently, an important feature known as the region covariance descriptor has been proven to be a very effective tool for visual tracking [17], pedestrian detection [18], face recognition [19] and texture Corresponding author (email: [email protected])

A variational method for contour tracking via covariance matching

  • Upload
    pei-li

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

. RESEARCH PAPER .

SCIENCE CHINAInformation Sciences

November 2012 Vol. 55 No. 11: 2635–2645

doi: 10.1007/s11432-012-4701-9

c© Science China Press and Springer-Verlag Berlin Heidelberg 2012 info.scichina.com www.springerlink.com

A variational method for contour tracking viacovariance matching

WU YuWei, MA Bo∗ & LI Pei

Beijing Laboratory of Intelligent Information Technology, School of Computer Science,Beijing Institute of Technology, Beijing 100081, China

Received February 10, 2012; accepted March 28, 2012

Abstract This paper presents a novel formulation for contour tracking. We model the second-order statistics

of image regions and perform covariance matching under the variational level set framework. Specifically,

covariance matrix is adopted as a visual object representation for partial differential equation (PDE) based

contour tracking. Log-Euclidean calculus is used as a covariance distance metric instead of Euclidean distance

which is unsuitable for measuring the similarities between covariance matrices, because the matrices typically lie

on a non-Euclidean manifold. A novel image energy functional is formulated by minimizing the distance metric

between the candidate object region and a given template, and maximizing the one between the background

region and the template. The corresponding gradient flow is then derived according to a variational approach,

enabling partial differential equations (PDEs) based contour tracking. Experiments on several challenging

sequences prove the validity of the proposed method.

Keywords contour tracking, covariance region descriptor, level set, Log-Euclidean Riemannian metric

Citation Wu Y W, Ma B, Li P. A variational method for contour tracking via covariance matching. Sci China

Inf Sci, 2012, 55: 2635–2645, doi: 10.1007/s11432-012-4701-9

1 Introduction

Object tracking is a well-known topic to computer vision community, and has found applications in a

wide range of areas including visual surveillance, medical imaging, human computer interaction, and

autonomous vehicles, etc. Different tracking strategies have been implemented and applied to overcome

the related difficulties, from Bayesian-based trackers [1,2], kernel-based methods [3,4] or online-learning-

based methods [5–8] to detection-based methods [9,10]. A thorough review can be found in [11,12].

The most methods mentioned above usually model shapes of objects as simple geometric primitives,

such as rectangles, ellipses, etc., hence leading to inaccurate extraction of object contour. In addition,

such simple geometric shapes cannot be applied for high-level motion analysis like pose recognition. In

contrast, contour-based methods aim to obtain the accurate contour of an object in each frame instead

of the rough locations. One technique that ideally provides precise localization of the target and its

boundary is the use of an implicit contour, or level set to represent the boundary of the object [13–16].

Recently, an important feature known as the region covariance descriptor has been proven to be a

very effective tool for visual tracking [17], pedestrian detection [18], face recognition [19] and texture

∗Corresponding author (email: [email protected])

2636 Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11

classification. By taking into account the multiple-feature fusion, the region covariance descriptor is

able to capture both spatial and statistical properties of each pixel in an object region with a low-

dimensional representation. This paper proposes covariance matching for partial differential equation-

based (PDE-based) contour tracking. To further reduce the complexity of computation, the Log-Euclidean

Riemannian metric [20] is employed as the similarity measure between covariance matrices. Based on the

Log-Euclidean metric, a novel image energy functional incorporated the similarities and the dissimilarities

between the features of an object is presented. Specifically, similarity is defined by the distance between

the covariance of the internal image region enclosed by the evolving contour and the predefined template

covariance. The dissimilarity is represented by the one between the resultant covariance of the external

image region enclosed by the evolving contour and the template. Starting with the variational approach,

we derive the gradient flow, enabling PDE-based contour tracking. A group of tracking experimental

results that prove the validity of the proposed contour tracking method is presented.

The rest of the paper is organized as follows. Section 2 provides a brief review of previous contour track-

ing studies that use the active contour model and covariance features. Section 3 presents the proposed

methods. We first introduce the covariance matrix as a region-level or object-level feature descriptor, and

then present Log-Euclidean calculus to measure the similarities between covariance matrices. Further-

more, we design image energies for contour tracking that, in essence, seeks optimal covariance matching

over an image domain given a known object template. Finally, we derive the corresponding gradient flow

equation. Section 4 presents experimental results on several real image sequences using the proposed

methods. Section 5 concludes our paper.

2 Related work

A number of methods which use a parameterized or an explicit representation for contours have been

proposed [21,22] for active contour tracking. The traditional active contour framework involves param-

eterizations of curves for performing visual tracking. Isard and Blake [1] proposed the condensation

algorithm which employs a particle filter to estimate the nonlinear probability density function of the

state of the tracked contour represented by B-spline. Ray and Acton [21] presented motion gradient

vector flow to track both slow- and fast-rolling leukocytes by minimizing an energy functional involving

the motion direction and the image gradient magnitude.

Other models such as geometric active contours [23,24] for representing contours are the level set

method where a contour is represented as the zero level set of a higher dimensional function. Paragios

and Deriche [25] proposed a geodesic active region framework for multiple-object tracking. In their

work, several visual cues are integrated within an objective function which contains change detection

foreground/background separation energy, edge-driven tracking energy and visual consistency energy.

Bertalmio et al. [26] presented morphing geometric active contours for visual tracking. Given two

consecutive frames, the morphing active contours can deform the first frame toward the second one via a

PDE, and track the deformation of the curves of interest in the first frame with an additional coupled PDE.

Mansouri [27] used the optical flow constraint for geometric active contour evolution. Sundaramoorthi

et al. [28] developed Sobolev active contours to perform coarse-to-fine segmentation and visual tracking.

Based on a well-structured Riemannian metric, Sobolev active contours evolve more globally and are less

attracted to certain intermediate local minima compared with traditional active contours. By simply

switching elements between two linked lists instead of solving any PDEs, Shi and Carl [29] proposed

a fast implementation of the level set method for real-time visual tracking. Like [14], Bibby and Reid

[30] introduced a probabilistic framework for real-time tracking using pixel-wise posteriors. Later, they

extended their method to achieve real-time tracking of multiple occluding objects [15].

To make active contour tracking more robust to noise, occlusion and illumination changes, the adoption

of shape priors in the contour evolution process has been shown to be an effective way. Cremers [14]

developed a Bayesian formulation for level set-based walking person tracking, and dynamical statistical

shape models are introduced for implicitly represented shapes. Rathi et al. [31] combined the geometric

active contour with particle filtering for visual tracking. The dynamic shape prior is obtained using the

Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11 2637

Figure 1 The flow chart of covariance matching for contour tracking.

locally linear embedding algorithm, which enables the exploration of local neighboring linear structures

within the training shape dataset. Yilmaz et al. [32] proposed an energy functional that combines color

features with the filter responses of spatial orientation selective filters using probability theory. Although

the results obtained in [32] are desirable, one significant assumption made in the proposed approach is not

accurate enough. It is assumed that the pixels in foreground and background regions are independent

of one another. Freedman et al. [16] used the geometric active contour for distribution tracking and

adopted Euclidean similarity transformations and nonrigid transformations to constraint object shapes.

Our work is mainly inspired by a number of recent works [17,24]. Porikli et al. [17] proposed a

covariance-based object description and a Lie algebra-based update mechanism for visual tracking. They

represented an object window as the covariance matrix of features, enabling the capture of spatial and

statistical properties, as well as their correlation, within the same representation. The tracking is per-

formed by globally or locally searching the image region, whose covariance matrix best matches a given

template covariance matrix. The distance metric uses the sum of the squared logarithms of the gener-

alized eigenvalues to compute the similarities between covariance matrices. Our work differs from [17]

in that we track covariance matrix using the geometric active contour, instead of performing a direct

search on the image domain. As a result, our method can track deformable objects while inheriting the

advantages of the covariance matrix as the visual object descriptor.

3 Our method

Given an image I(X), where X = (x, y) is pixel coordinates. For a predefined image region, we compute

the covariance matrix of its features as a template covariance matrix. In the current frame, the region

which optimally match the template matrix using the geometric active contour model is searched for.

Figure 1 demonstrates the flow of the covariance matching for contour tracking. In this framework,

the covariance matrix is defined as an image region descriptor. Then, we construct region-based energy

functional that uses the second-order statistics of both the candidate foreground region and background

region for contour tracking. The corresponding gradient flow is derived under the variational level set

framework, in accordance with the variational approach [24]. Finally, combining the image energy term

and the shape energy term, we present a complete energy functional and its corresponding gradient flow

equation.

3.1 Visual object representation

Around each pixel on the image plane, we can extract feature vector f of the pixels, whose components

can include the pixel coordinates, image gray level or color, image gradients, edge magnitude, edge

orientation, filter responses, etc. For example, f = [x, y, I, Ix, Iy,√I2x + I2y ]

T.

Given the extracted feature vector f defined on an image plane Ω, we suppose that the image domain

is divided into two parts (i.e. the interior region R, the exterior region Rc and Ω = R ∪ Rc) by an

evolving contour C(s), where s is the arc length parameter. The d× d covariance matrix SR for a given

image region R is computed by

SR =

∫R(f − μR)(f − μR)

TdX∫RdX

, (1)

2638 Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11

where μR =∫RfdX/

∫RdX. Covariance matrix can represent not only the variances of each feature,

but also their respective correlations. By taking into account the multiple-feature fusion, the region

covariance descriptor is able to capture both spatial and statistical properties of each pixel in an object

region with a low dimensional representation [17].

3.2 Image energy term

Given a template covariance matrix ST and a candidate image region for the current frame, the distance

between ST and SR in the candidate image region needs to be computed. However, the covariance matri-

ces lie on a Riemannian manifold rather than the Euclidean space. Therefore, an arithmetic subtraction

of two matrices would not measure the distance of the corresponding regions. To address this issue,

Porikli et al. [24] developed the sum of the squared logarithms of the generalized eigenvalues for the

covariance matrices as distance metric. This metric is the affine-invariant Riemannian metric expressed

in Eq. (2). However, the computational cost for this Riemannian mean grows linearly as time progresses

ρ(ST ,SR) =

√√√√ d∑k=1

lnλ2k(ST ,SR), (2)

where λk is the kth generalized eigenvalue of ST and SR.

Directly deriving the gradient flow from Eq. (2) is difficult. To simplify the derivation of gradient

flow for visual tracking, we adopt the Log-Euclidean Riemannian metric proposed by Arsigny et al. [20].

The Riemannian means take a much simpler form under the Log-Euclidean metric than those under the

affine-invariance metric. Under the Log-Euclidean Riemannian metric, the distance between two points

X and Y is calculated by ‖ log(Y ) − log(X)‖. Thus, the energy functional for geometric active contour

tracking can be defined as

Ei,F =‖ lnSR − lnST ‖F , (3)

where ‖ · ‖F denotes the Frobenius norm of the matrix. Let φ(x, y) be an embedding function, whose

zero level set corresponds to evolving curve C(s), and H(φ) be the Heaviside function. Suppose that a

set of points {x ∈ Ω : H(φ(x)) � 0} corresponds to interior region R surrounded by a curve C(s). The

energy defined in Eq. (3) can be regarded as a functional of level set function φ. The expression for SR

as a matrix function of φ can be given by

SR(φ) =

∫ΩH(φ)(f − μR(φ))(f − μR(φ))

TdX∫Ω H(φ)dX

, (4)

where μR(φ) =∫ΩH(φ)fdX/

∫ΩH(φ)dX is the mean vector within region R. By minimizing the

image energy functional defined in Eq. (3), we try to deform the evolving contour to the moving object

of interest.

In general, good tracking results can be obtained by minimizing Eq. (3). When the local area of

the object region produces same covariance as the entire object region does, however, minimizing this

image energy functional may fail to extract the complete object boundary. This functional only takes

into account the statistics of the candidate foreground region. Thus, based on the information from both

the foreground and background, the modified energy functional is defined as

Ei,B = α ‖ lnSR − lnST ‖F −β ‖ lnSRc − lnST ‖F , (5)

where SRc is the covariance matrix of region Rc, and α and β both are the adjusting parameters. As a

functional of level set function φ, SRc can be expressed as

SRc(φ) =

∫ΩH(−φ)(f − μRc(φ))(f − μRc(φ))TdX∫

Ω H(−φ)dX, (6)

Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11 2639

Figure 2 The selection of background region for computing Ei,B .

where μRc(φ) =∫ΩH(−φ)fdX/

∫ΩH(−φ)dX is the mean vector of background region Rc. To better

model the distributions both the interior and exterior of an object curve, we empirically extract a bounding

box with 2H × 2W pixels as the region of interest, as shown in Figure 2. The background is defined as

the surrounding region with 2H × 2W—area pixels, where area denotes the area of the object. Figure 2

illustrates the process of selecting the background. Intuitively, Eq. (5) minimizes the covariance distance

between the foreground region and the object template while maximizing the covariance distance between

the background and the object template. Note that when β = 0, Eq. (5) degenerates to functional

Eq. (3).

Minimizing the energy functional Eq. (5), the level set formulation is expressed as

∂φi,B

∂t= − ∂Ei,B

∂φ

=−α

‖ Θ1 ‖Fd∑

i=1

d∑j=1

(Θ1)i,j

(S−1R (φ)

∂SR(φ)

∂φ

)

i,j

‖ Θ2 ‖Fd∑

i=1

d∑j=1

(Θ2)i,j

(S−1Rc (φ)

∂SRc(φ)

∂φ

)

i,j

, (7)

where⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Θ1 = lnSR − lnST ,

Θ2 = lnSRc − lnST ,

∂SR(φ)

∂φ=

δ(φ)

AR

(ffT − 1

AR

∫∫

Ω

H(φ)ffTdΩ − fμTR − μRf

T + 2μRμTR

),

∂SRc(φ)

∂φ=

δ(φ)

ARc

(− ffT +

1

ARc

∫∫

Ω

H(−φ)ffTdΩ + fμTRc + μRcfT − 2μRcμT

Rc

).

(8)

Here δ(φ) denotes delta function, and AR =∫∫

ΩH(φ)dΩ, ARc =

∫∫ΩH(−φ)dΩ. Please visit http://mci-

slab.cs.bit.edu.cn/member/wuyuwei/Research.htm for details.

3.3 Shape energy term

To make active contour tracking more robust to noise, occlusion and illumination changes, the adoption of

shape priors in the contour evolution process has been shown to be an effective way. Given a single curve

template, determined by the zero level set of a template embedding function φ(X), where X = (x, y) is

pixel coordinates, we can minimize the following energy functional to drive the evolving contour close to

the curve template up to a Euclidean similarity transformation:

Esh =

Ω

[H(φ)−H(φ(γ ·Λr ·X +Λt))]2dX, (9)

where γ, Λr and Λt are the scaling, rotation, and translation parameters, respectively. The Euclidean

similarity transformation adopted in our method is expressed as

φ(X) = φ(γ ·Λr ·X +Λt). (10)

2640 Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11

By Eq. (10), the corresponding first variation can be derived as

∂φsh

∂t= 2δ(φ)[H(φ) −H(φ(γ ·Λr ·X +Λt))]. (11)

For each current frame, we choose the similarity parameters to keep φ(γ · Λr · X + Λt) as close to the

tracking result of the previous frame as possible. Let φn−1 and φn−2 be the final level set functions of

the frame n − 1 and the frame n − 2, respectively, and let φ(X) = {X : φ(X) < 0}. Then we can

estimate γ = A(φn−1)/A(φn−2) , where A(φn−1) and A(φn−2) denote the areas of internal region of the

ultimate curve in the frame n−1 and the frame n−2, respectively, The translation vector Λt is estimated

according to φn−1 and φn−2 as follows:

Λt =

∑φn−1(X)<0 X∑φn−1(X)<0 1

−∑

φn−2(X)<0 X∑φn−2(X)<0 1

. (12)

The rotation matrix Λr is determined by Procrustes methods [33]. For more details, we refer the readers

to [34].

3.4 Complete energy functional

By integrating the image energy term and the shape energy term, the complete energy functional is

expressed as a linear combination of Ei(φ) and Esh(φ):

E(φ) = λEi(φ) + (1− λ)Esh(φ), (13)

where Ei = Ei,F or Ei,B and Esh denote the image energy functional and shape energy functional,

respectively. We set Ei = Ei,B in our experiments. In practice, tuning weighting factors λ is not easy

and cannot be done automatically. We will give more explanation about setting λ in Subsection 4.1. The

gradient descent flow of Eq. (13) for level set function is as follows:

∂φ

∂t= λ

∂φi

∂t+ (1− λ)

∂φsh

∂t. (14)

In implementing Eq. (14), it is numerically necessary to periodically re-initialize the level set function to

a signed distance function during the evolution. Specifically, re-initialization of an embedding function φ

is a process of making |φ| = 1 while remaining the zero level set of the curve C unchanged. One of the

efficient methods for re-initialization is the fast marching method [36].

4 Experiments

The performance of the proposed tracking methods was demonstrated on several real image sequences.

All the experiments were run on a Pentium machine operating at 1.97 GHz, and using a MATLAB

implementation. No code optimization was performed. We extract the feature vectors of the original

color images

f =

[R,G,B, Ix, Iy, Ixx, Iyy, Ixy

√I2x + I2y

]T,

whereas, for the gray images, the first three components (R,G,B) are replaced by only one compo-

nent, i.e., the gray value. To solve Eq. (14), a Heaviside function H is approximated by a smooth

function Hε(φ) = 1/2 + arctan(φ/ε)/π. Its derivative gives a regularized version of delta function

δε(φ) = ε/[π(ε2 + φ2)]. When ε → 0, Hε(φ) → H(φ) and δε(φ) → δ(φ). In all our tests, we

set ε = 1.5. Various experimental results (in rmvb format) shown in this article can be found at

http://mcislab.cs.bit.edu.cn/member/wuyuwei/Research.htm.

Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11 2641

Figure 3 A comparison of image energy Ei,B indicated in the top row with the image energy Ei,F shown in the bottom

row on the flag sequence.

4.1 Setting parameter λ

In this section, we test the performance of our approach versus λ in the presence of disturbing factors

(e.g. pose changes). Generally speaking, the larger a weight, the more important its corresponding term

becomes when minimization is carried out. In general, when tracking a nonrigid object (e.g. female

skater), we may set a large weight to the shape prior term since it prevents the contour from being

attracted to background objects wrongly. When tracking a rigid object (e.g. Sylvester sequence), we may

set a large weight to the image energy term. In conclusion, there was no evidence that there is a best

possible way to choose a value for λ, and therefore, this is to be decided according to the application

after trial-and-error procedure.

4.2 Qualitative comparison

To demonstrate the performance of our tracking model, We first test the energy functional with image

energy Ei,B . Compared with image energy Ei,F , image energy Ei,B also integrates knowledge on the

background region. In most cases, it may outperform Ei,F . Another choice for this image energy is replac-

ing subtraction with division; thus, control parameter γ vanishes. In addition, a number of recent studies

show that minimizing these image energies cannot guarantee good tracking results. A possibly better

alternative is to make the image energy approach as close as possible to a constant learned beforehand.

The top row of Figure 3 illustrates the tracking results of the flag sequence with a certain deformation

using Ei,B . The main difficulty with this sequence is that the covariance of some internal areas is similar

to that of the flag as a whole. Therefore, only minimizing Ei,F together with the shape energy term

often drives the final contour to converge to a smaller internal region within the flag, as shown in the

bottom row of Figure 3. In contrast, the bottom row of Figure 3 indicates that good tracking results are

guaranteed when both matching similarities and matching dissimilarities are considered using Eq. (5).

We adopt the image energy Ei,B for all the subsequent experiments.

We provide a qualitative comparison between our method and those of the results obtained with the

distribution matching tracker [16]. The main reason why we choose the method in [16] as a comparison

can be explained as follows. The important features of both distribution [16] and region covariance

descriptor have proven to be very effective tools for non-contour tracking. Freedman et al. proved the

feasibility of distribution matching for contour tracking in their research work. Motivated by this, we

propose using covariance matching for contour tracking under the level set framework. It is therefore very

natural for our method to be evaluated by comparison with distribution matching method. We adopted

Bhattacharyya distance as a criterion, and set the bins to 16×16×16 in the distribution matching-based

active contour algorithm.

In the first experiment, Figure 4 presents the comparison results on the face sequence, in which occlusion

occurs. Our proposed method can achieve better tracking results. In both methods, we set the shape

parameters to zero because inferior tracking results may be obtained when severe occlusion occurs. We

have tested our method extensively on other challenging sequences. The female skater sequence is obtained

from http://www.cise.ufl.edu/smshahed/. It contains over 150 frames, and the dazzling performance is

accompanied by a large pose variation. Figure 5 demonstrates some representative tracking results. We

can see that when the tracked skater turns her body, the distribution matching tracker deviates from the

object center at the frame 51 (e.g. part of the object is outside the contour). Overall, our method is able

2642 Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11

Figure 4 A comparison of our tracker in the top row with the distribution matching-based active contour tracker [16] in

the bottom row on the face sequence. Rows from left to right show frames 17, 40, 67, 130 and 157.

Figure 5 A comparison of our tracker in the top row with the distribution matching-based active contour tracker [16] in

the bottom row on the female skater sequence. Rows from left to right show frames 2, 21, 31, 51, 71 and 91.

Figure 6 A comparison of our tracker in the top row with the distribution matching-based active contour tracker [16] in

the bottom row on the Sylvester sequence. Rows from left to right show frames 7, 73, 127, 223, 307 and 400.

Figure 7 A comparison of our method with Bha tracker [16] on CAVIAR sequence. Top: Bha tracker results, bottom:

our tracker results. Frame number from left to right are 45, 63, 80, 103, 121, and 137, respectively.

to track the skater well and provides much more accurate and consistent tracking contours. In this

experiment, the parameter λ is set at 0.2.

Next, the Sylvester sequence is a moving animal doll obtained from http://www.cs.toronto.edu/ross/

ivt/. In this sequence, the target frequently changes its pose as well as its scale. The changing lighting

condition also makes the target hard to be distinguished. Our method can achieve better human object

tracking results than distribution matching-based active contour tracking, as shown in Figure 6. In this

experiment, the parameter λ is set at 0.6. Figure 7 shows the results of tracking a pedestrian on the

CAVIAR1) sequence. The color of the clothes is very similar to that of background regions in this sequ-

1) http://homepages.inf.ed.ac.uk/rbf/CAVIAR/

Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11 2643

Figure 8 Comparisons of the Jaccard similarity coefficient of our method with distribution based tracker [16].

ence, which makes accurate extraction of the object region more difficult. From Figure 7, it is clear

that our method achieves desirable results. In contrast, the distribution matching-based tracker is lost

in tracking. In this experiment, the parameter λ is set at 0.3.

4.3 Quantitative Comparison

To make a quantitative comparison between proposed method and other two algorithms, we adopted

a simple measure for trackers evaluation, namely Jaccard similarity coefficient [35] for comparison. A

Jaccard similarity coefficient is defined as the intersected area of the extracted contour with ground truth

divided by their united area given by

ρ =Ctrackingcontour

⋂Cgroundtruth

Ctrackingcontour

⋃Cgroundtruth

. (15)

For each tracking task, we manually mark the ground truth every three frames. Quantitative performance

of our method and distribution matching tracker [16] with respect to error measurements are summarized

in Figure 8. As is evident in Figure 8, our method obtains a higher similarity coefficient with the ground

truth for these four sequences.

5 Conclusions

In this work, we have proposed a variational-based tracking method, using level set formulation that

models the second-order statistics of the visual object and image region. Under the Log-Euclidean

Riemannian metric, the region energy functional tries to maximize the covariance distance between the

region outside the evolving contour and the template while minimizing it between the internal image

region and a given template covariance. The shape constraints make our model more robust in a cluttered

environment. Experiments on real image sequences demonstrate the effectiveness of our method.

2644 Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11

Acknowledgements

This work was supported by National High-tech Research & Development Program of China (863 Program) (Grant

No. 2009AA01Z323) and Major State Basic Research Development Program of China (Grant No. 2012CB720003).

The authors appreciate the anonymous reviewers for their invaluable comments and suggestions.

References

1 Isard M, Blake A. Condensation conditional density propagation for visual tracking. Int J Comput Vis, 1998, 29: 5–28

2 Li P, Zhang T, Ma B. Unscented Kalman filter for visual curve tracking. Image Vis Comput, 2004, 22: 157–164

3 Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell, 2003, 25:

564–575

4 Yilmaz A. Kernel based object tracking using asymmetric kernels with adaptive scale and orientation selection. Mach

Vis Appl, 2011, 22: 255–268

5 Collins R, Liu Y, Leordeanu M. Online selection of discriminative tracking features. IEEE Trans Pattern Anal Mach

Intell, 2005, 27: 1631–1643

6 Babenko B, Yang M, Belongie S. Robust object visual tracking with online multiple instance learning. IEEE Trans

Pattern Anal Mach Intell, 2011, 33: 1619–1632

7 Grabner H, Leistner C, Bischof H. Semi-supervised on-line boosting for robust tracking. In: Forsyth D, Torr P,

Zisserman A, eds. Proceedings of the 10th European Conference on Computer Vision (ECCV). Berlin, Heidelberg:

Springer-Verlag, 2008. 234–247

8 Ross D, Lim J, Lin R, et al. Incremental learning for robust visual tracking. Int J Comput Vis, 2008, 77: 125–141

9 Andriluka M, Roth S, Schiele B. People-tracking-by-detection and people-detection-by-tracking. In: Proceedings of

IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8

10 Breitenstein M, Reichlin F, Leibe B, et al. Online multi-person tracking by-detection from a single, uncalibrated

camera. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 1820–1833

11 Yilmaz A. Javed O, Shah M. Object tracking: A survey. ACM Comput Surv, 2006, 38: 13–57

12 Cannons K. A Review of Visual Tracking. Technical Report CSE-2008-07, York University, 2008

13 Paragios N, Deriche R. Geodesic active regions and level set methods for motion estimation and tracking. Comput Vis

Image Underst, 2005, 97: 259–282

14 Cremers D. Dynamical statistical shape priors for level set-based tracking. IEEE Trans Pattern Anal Mach Intell,

2006, 28: 1262–1273

15 Bibby C, Reid I. Real-time tracking of multiple occluding objects using level sets. In: Proceedings of the 23th IEEE

Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010. 1307–1314

16 Freedman D, Zhang T. Active contours for tracking distributions. IEEE Trans Image Process, 2004, 13: 518–526

17 Porikli F, Tuzel O, Meer P. Covariance tracking using model update based on Lie algebra. In: Proceedings of IEEE

Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), New York, 2006. 728–735

18 Tuzel O, Porikli F, Meer P. Pedestrian detection via classification on Riemannian manifolds. IEEE Trans Pattern Anal

Mach Intell, 2008, 30: 1713–1727

19 Pang Y, Yuan Y, Li X. Gabor-based region covariance matrices for face recognition. IEEE Trans Circuits Syst Video

Technol, 2008, 18: 989–993

20 Arsigny V, Fillard P, Pennec X, et al. Fast and simple calculus on tensors in the Log-Euclidean framework. In:

Duncan J S, Gerig G, eds. Proceedings of the 14th International Conference on Medical Image Computing and

Computer-Assisted Intervention (MICCAI). Berlin, Heidelberg: Springer-Verlag, 2005. 115–122

21 Ray N, Acton S. Motion gradient vector flow: An external force for tracking rolling leukocytes with shape and size

constrained active contours. IEEE Trans Med Imaging, 2004, 23: 1466–1478

22 Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models. Int J Comput Vis, 1988, 1: 321–331

23 Caselles V, Kimmel R, Sapiro G. Geodesic active contours. Int J Comput Vis, 1997, 22: 61–79

24 Chan T F, Vese L A. Active contours without edges. IEEE Trans Image Process, 2001, 10: 266–277

25 Paragios N, Deriche R. Geodesic active regions for motion estimation and tracking. In: Proceedings of the Seventh

IEEE International Conference on Computer Vision, Kerkyra, 1999. 688–694

26 Bertalmio M, Sapiro G, Randall G. Morphing active contours. IEEE Trans Pattern Anal Mach Intell, 2000, 22: 733–737

27 Mansouri A R. Region tracking via level set PDEs without motion computation. IEEE Trans Pattern Anal Mach

Intell, 2002, 24: 947–961

28 Sundaramoorthi G, Yezzi A, Mennucci A. Coarse-to-fine segmentation and tracking using Sobolev active contours.

IEEE Trans Pattern Anal Mach Intell, 2008, 30: 851–864

29 Shi Y, Karl W C. Real-time tracking using level sets. In: Proceedings of IEEE Computer Society Conference on

Computer Vision and Pattern Recognition (CVPR), San Diego, 2005. 34–41

Wu Y W, et al. Sci China Inf Sci November 2012 Vol. 55 No. 11 2645

30 Bibby C, Reid I. Robust real-time visual tracking using pixel-wise posteriors. In: Forsyth D, Torr P, Zisserman A,

eds. Proceedings of the 10th European Conference on Computer Vision (ECCV). Berlin, Heidelberg: Springer-Verlag,

2008. 831–844

31 Rathi Y, Vaswani N, Tannenbaum A. A generic framework for tracking using particle filter with dynamic shape prior.

IEEE Trans Image Process, 2007, 16: 1370–1382

32 Yilmaz A, Li X, Shah M. Contour-based object tracking with occlusion handling in video acquired using mobile

cameras. IEEE Trans Pattern Anal Mach Intell, 2004, 26: 1531–1536

33 Dryden I, Mardia K. Statistical Shape Analysis. Chichester: John Wiley & Sons, 1998

34 Zhang T, Freedman D. Tracking objects using density matching and shape priors. In: Proceedings of the Ninth IEEE

Conference on Computer Vision, Nice, 2003. 1056–1062

35 Udupa J, Leblanc V, Zhuge Y, et al. A framework for evaluating image segmentation algorithms. Comput Med

Imaging Graph, 2006, 30: 75–87

36 Sethian J A. Level Set Methods and Fast Marching Methods. Cambridge: Cambridge University Press, 1999