Robust Incremental Subspace Learning for Object Tracking
Upload
others
View
4
Download
0
Embed Size (px)
344 x 292
429 x 357
514 x 422
599 x 487
Citation preview
Object Tracking
MOE-Microsoft Laboratory for Intelligent Computing and Intelligent
Systems, Department of Computer Science and Engineering,
Shanghai Jiao Tong University, Shanghai, 200240, China
{skicy,huzhiwei,htlu}@sjtu.edu.cn
Abstract. In this paper, we introduce a novel incremental subspace
based object tracking algorithm. The two major contributions of our
work are the Robust PCA based occlusion handling scheme and revised
incremental PCA algorithm. The occlusion handling scheme fully
makes use of the merits of Robust PCA and achieves promising
results in occlu- sion, clutter, noisy and other complex situations
for the object tracking task. Besides, the introduction of
incremental PCA facilitates the sub- space updating process and
possesses several benefits compared with traditional R-SVD based
updating methods. The experiments show that our proposed algorithm
is efficient and effective to cope with common object tracking
tasks, especially with strong robustness due to the intro- duction
of Robust PCA.
1 Introduction
During the past decades, object tracking is rapidly developed since
it is widely used in many different areas, like surveillance, human
computer interface, en- hanced reality and so forth. The accuracy
and robustness of object tracking influence the performance of
these applications. Therefore, object tracking has been a hot
research area in computer vision and a lot of researches have been
explored in this area. The main challenge of object tracking is the
difficulty in handling the appearance variability of a target
object. There are two categories of appearance variabilities,
intrinsic and extrinsic appearance variabilities. The intrinsic
appearance variabilities mainly include shape deformation and pose
variation of a target object. On the other hand, changes in
illumination, changes in viewpoint and partial occlusion belong to
extrinsic variabilities. All the ap- pearance variabilities pose
great challenges to accurately locate the target ob- ject including
the well-known methods [1,2]. However, subspace based methods,
solving this problem by modeling such appearance variabilities in
low-dimension space, prove to be efficient and effective in [3,4].
[3] first brought eigenspace to model appearance changes of target
objects. The advantages of this subspace representation are several
folds. Firstly, the subspace representation provides a compact
notion of ”thing” to be tracked rather than ”stuff”, which means
struc- ture information is fully utilized in the appearance
representation. Besides, this
C.S. Leung, M. Lee, and J.H. Chan (Eds.): ICONIP 2009, Part I, LNCS
5863, pp. 819–828, 2009. c© Springer-Verlag Berlin Heidelberg
2009
820 G. Yu, Z. Hu, and H. Lu
method also survives in large appearance changes. But the needs to
train the appearance model before starting the program and solve
complex optimization problems limit the use of this method. Later,
Lim et al improves this method in [4] by using R-SVD [7] to
incrementally update the subspace and Particle Filter to replace
the complex optimization steps. Due to the merits of stochastic
methods, local minimum problem caused by deterministic optimization
meth- ods is well solved. Based on [3], [15] makes use of
Rao-Blackwellized Particle Filter, achieving promising results in
clutter environment. Lin et al[16] further optimizes the framework
of [4] according to the idea of Fisher Discriminant Analysis(FDA).
The import of the second subspace makes the method more
discriminative since the utilization of background appearance.
Meantime, Ho et al [11] replaces the traditional L2 reconstruction
error norm with uniform L2 reconstruction error norm and achieves
promising experimental results. Re- cently, Zhang et al [12]
utilizes the framework of Graph Embedding and proposes a new
discriminative subspace representation. Besides, Log-Euclidean
Rieman- nian Subspace [14] and Tensor Subspace [13] are also
brought in to handle the appearance variabilities. Although the
theory parts and experimental results of these methods sound
attractive, the overall framework is almost the same and similar
with [4]. The possible differences lie on the subspace
representation and corresponding R-SVD based updating algorithm.
The import of Log-Euclidean and Tensor subspace strengthen the
robustness and accuracy of tracking results. However, in the
meantime, they also add additional complexities to the track- ing
framework, and the tracking speed may be influenced by the
complicated subspace representations. Accordingly, the
disadvantages may limit the wide use of these methods. Our method,
on the other hand, avoids the complexity of elaborately subspace
representation and adopts the traditional PCA-based representation.
To obtain robust and accurate tracking performance, a Robust PCA
[5] is utilized in our framework.
Two components are inevitable in subspace based methods. One is the
sub- space representation and the other is the algorithm to update
the subspace in- crementally. Although different subspaces are
utilized to model the appearance variabilities, almost all the
above methods update their corresponding subspace based on the
R-SVD algorithm. In this paper, however, we will adopt a new
incremental subspace updating algorithm, which possesses several
beneficial ad- vantages compared with R-SVD. Furthermore, according
to the experiments of previous subspace methods, although occlusion
may be handled well in the sim- ple situations, the performance is
deteriorated when the scenes become complex. On the other hand, if
the subspace is updated when occlusion happens, outliers will bias
the subspace and probably make the tracking results drift from the
target region. In order to cope with these problems, a novel
occlusion handling scheme is proposed in our paper. The main idea
of this scheme is based on Robust PCA.
The rest of paper is structured as follows. Section 2 describes our
subspace rep- resentation. Updating scheme(Incremental PCA) and our
algorithm framework
Robust Incremental Subspace Learning for Object Tracking 821
are discussed in section 3. Section 4 gives some experimental
results of our method and section 5 concludes this paper.
2 Robust PCA Based Learning
Principle Component Analysis is one of the traditional dimension
reduction methods which has been widely used in computer vision
group. Since PCA min- imizes L2 reconstruction error, it is also
considered as one of most successful reconstructive methods. By
projecting a new sample into a pretrained subspace, the
reconstruction error can be regarded as a useful tip for deciding
whether the new sample is a kind of object that is similar with the
training set. Hence, the intrinsic nature of PCA makes it practical
in object tracking area. There are nu- merous works dedicated to
make use of the merits of PCA in the object tracking programs. Some
of fundamental and influential works are[3,4]. Although many works
try to further improve the discriminability of subspace based
methods, robustness is actually one of the essential problems
currently which limits the wide application of subspace based
methods. According to our experimentations, it can be easily found
that the tracking methods lose their targets not because of lacking
discriminative abilities but because of lacking robustness,
especially in complex situations that occlusion and fast movement
of target object happen. Thus, our method is proposed not to
strengthen the discriminability of subspace based methods but to
increase the robustness of PCA based methods.
2.1 Robust PCA
Let n be the number of images in the training set, each of which
having m pixels. The training data set then can be represented by X
= [x1,x2, · · · ,xn],xi ∈ R
m, where xi refers to the ith training image. We will use the
notation U,U ∈ R
m×k
for the truncated eigen basis where k means the number of bases we
keep. For the traditional PCA, U = [u1,u2, · · · ,uk],ui ∈ R
m is calculated by min- imizing the reconstruction error:
E = n∑
i=1
2
(1)
where aij = uj Txi. To solve Eq. 1, either Eigen Decomposition or
SVD(Singular
Value Decomposition) can be used. The goal of reconstructive
methods is to find ai once a new sample arrives. In traditional
PCA, ai = UTxi. However, when there are outliers or noises in the
image xi, the coefficients ai may be influenced by these
contaminated pixels. Robust PCA, on the other hand, can limit the
influences of outliers and noises and achieve robust results.
In the following part, a brief discussion of Robust PCA will be
presented. For the detail information, we can refer to [5]. To
achieve robustness, subsampling is employed in the calculation of
coefficients a. The full process can be reviewed
822 G. Yu, Z. Hu, and H. Lu
as a hypothesize-and-select paradigm using only subsets of image
pixels. There are two major steps, generating hypotheses and
selection.
First of all, suppose U is calculated from a training data set, let
us return to Eq. 1. Due to only subsets of pixels of a new image
sample being considered, we need to seek the solution of a which
minimizes
E(r) = q∑
i=1
2
(2)
where r = [r1, r2, · · · , rq], k < q < m refers to q points
selected from m pixels in a new image x.
The minimization of Eq. 2 can be easily solved by least square.
Then, in the first step of Robust PCA, several hypotheses are
generated, each one referring to a subset of points(r). For each
hypothesis, in each step of minimization, we get one temporary
solution of coefficients a and the corresponding reconstruction
error for each point(ξi = xi −
∑k j=1 ajuji). Through trimming part of the points
whose ξi are above a threshold, a new solution of a can be obtained
with the trimmed set of points. This iterative step continues until
the number of points in the hypothesis is below a predefined
threshold. In the final hypothesis, a notion of compatible points
is defined as follow:
D = {j|ξ2j < θ}, where θ = 2 m
n∑ i=k+1
λi (λi is eigen value) (3)
The cardinality of the compatible points set is denoted as s = |D|
which can provide useful information for the selection step.
According to this method, several candidate hypotheses for a (each
a repre- sents a potential coefficients vector computed from a
subset of sample points with Eq. 2) are generated. The optimal one
is selected to maximize the following function:
ci = K1si −K2||ξ||Di , where ||ξ||Di = ∑ j∈Di
ξ2j
where si and ||ξ||Di refer to the number of compatible points and
the reconstruc- tion error over the set Di(Di refers to a set of
pixels from one image, from which the coefficients a is computed),
and the coefficients K1 and K2 are parameters.
2.2 Occlusion Handle Scheme
In this subsection, a carefully designed occlusion handling scheme
will be dis- cussed. Though the scheme is mainly applied to deal
with the occlusion situation, it is useful for some complex
situations like out-of-plane rotation and clutter background as
well.
For the tracked object, there will be three possible states for
each frame. One is the normal state, meaning that all the
conditions are normal and there is
Robust Incremental Subspace Learning for Object Tracking 823
Normal
T1
Fig. 1. State Transition Graph
no special arrangement for this state. The second kind of state is
partial occlu- sion(POcclusion), which is mainly used for
preventing the false updating of the subspace when occlusion
happens. The last state is full occlusion(FOcclusion), in which we
need to increase the particle number and state variance in order to
relocate the target object. Besides, since the target may be fully
occluded, we do not update the target position with the new
estimation. For simplicity, we denote t as the frame number in the
video. Fig. 1 is a simple description of the possible transitions
for the three states.
At time frame t, the five transitions happen when certain
requirements are met. T 1: γt > θ1 T 2: γt > kθ1 T 3: γt >
kθ1 T 4: γt ≤ kθ1 T 5: δt < θ2
If none of these requirements are met, it means that the target
keeps still in the original state. In the above requirements, θ1,
θ2 are thresholds to decide whether current state is occluded, and
k(k > 1) is a coefficient. γt and δt refer to the reconstruction
difference based on the robust coefficients(at) and summation of
these differences. They can be calculated as follows:
δt = ∑
αt−jγj γt = ||xt − Utat||2 (4)
where θ3 is the number of frames to consider and α is a forget
factor(α = 0.9 in our experiments).
Intuitively, we can easily interpret these five transitions as
below. When no occlusion happens, the robust reconstructed image
will differ little from the orig- inal image, which means that the
requirement of T1 cannot be met. However, once occlusion happens,
γt will be certainly larger than θ1 and the target will fall to the
partial occlusion state, which means the subspace should not be up-
dated due to the noises and outliers in the new image samples. In
the same time, we set a higher threshold kθ1 for indicator of full
occlusion. When full occlusion happens, we keep the target object
still in the last position until the full occlusion state is
stopped. Besides, in order to locate the target object when the
target appears again, the variance of state variable and number of
particles are increased. Once we find a state meeting the
requirement of T4, the variance of state variable and number of
particles return to the original values. If the requirement of T5,
the latest summation of reconstruction differences is below the
threshold θ2, is met, it means that the target is no longer in
occlusion state and the updating step can be started again.
824 G. Yu, Z. Hu, and H. Lu
Fig. 2. Tracking results based on occlusion handling scheme
We illustrate the state transitions in Fig. 2. There are two
examples in Fig. 2. In the first row, the first and fifth images
refer to the normal state. The second and fourth images represent
the partial occlusion state. Full occlusion state is illustrated in
third image. There are two subimages in each image, representing
the target object in the frame and reconstructed target object with
robust co- efficients. It is obvious that updating with the
reconstructed target object can prevent noises and outliers from
biasing the subspace. The second row shows another successful
example of our occlusion handling scheme.
We illustrate the state transitions in Fig. 2 with two examples.
There are two subimages in each image, representing the target
object in the frame and reconstructed target object with robust
coefficients. It is obvious that updating with the reconstructed
target object can prevent noises and outliers from biasing the
subspace.
3 Proposed Tracking Algorithm
3.1 Overview of the Approach
The framework of our method is similar with [4]. There are two
major com- ponents of the framework, locating the target region and
updating the sub- space. The visual tracking problem can be
formulated as an inference problem based on Hidden Markov Model,
where Xt and It refer to hidden state vari- able(target region) and
observed variable(video frame) respectively. Let Xt = (xt, yt, θt,
st, αt, φt), where xt, yt, θt, st, αt, φt denote x translation, y
translation, rotation angle, scale, aspect ratio and skew direction
at time t. According to Bayesian theorem, we have:
p(Xt|It) ∝ p(It|Xt) ∫ p(Xt|Xt−1)p(Xt−1|It−1)dXt−1 (5)
Due to the difficulties in directly calculating the posterior
probability p(Xt|It), stochastic approximation methods like
particle filter [1] are adopted to approx- imate the probability
with a stochastically generated set of weighted samples. The
dynamic model(p(Xt|Xt−1)) is modeled by a Gaussian
distribution:
p(Xt|Xt−1) = N(Xt;Xt−1, ψ) (6)
Robust Incremental Subspace Learning for Object Tracking 825
To introduce probabilistic interpretation, the observation model is
modeled with PPCA [10], which is widely used in subspace based
object tracking methods. For simplicity, the observation model in
our method is governed by a Gaussian distribution:
p(It|Xt) = N(It;μ,UUT + εI) ∝ exp(−||(It − μ) − Uat||2) (7)
where I is an identity matrix, ε is the additive Gaussian noise in
the observation process, μ is the mean of training images and at is
the coefficients ofXt outputted by Robust PCA. The detail proof can
refer to [8].
3.2 Incremental PCA
Once a new target location is estimated, the subspace need to adapt
to the new appearance change unless the target is predicted as
occlusion in our occlusion handling scheme. Although almost all of
previous works adopt the R-SVD [7] based incremental methods, we
try to use a different incremental scheme(IPCA) [6]. The benefits
are several folds. The first is of course the computation
efficiency. Also, IPCA is more extendable and flexible due to the
possibility to integrate spatial and temporal weights. Furthermore,
IPCA can perfectly integrated with Robust PCA and our occlusion
scheme and do not require any training images of the target object
before the tracking task starts. The IPCA algorithm can be viewed
in Algorithm. 1. For convenience, we suppose t > k in our
algorithm framework. For t ≤ k, it is also easy to deduce from the
algorithm.
Algorithm 1. Incremental PCA Input: Subspace eigen vectors Ut ∈
R
m×k, eigen value Dt ∈ R k, coefficients At ∈ R
k×t, mean vector μt ∈ R
m, new input image xt+1 ∈ R m
Output:Ut+1, Dt+1, At+1, μt+1
1. Get the Robust coefficients of xt+1 on current subspace: a =
RobustPCA(Ut, Dt, μt) with Eq. 2
2. Reconstruct the image: y = Uta + μt
3. Calculate reconstruction error r ∈ R m : r = xt+1 − y
4. Form new basis vector: U ′ = [ Ut r
||r|| ]
[ A′
At(:, 2 : t) if t ≥ θ4
6. Perform PCA on A′ obtaining mean value μ′′, eigenvectors U ′′
and Dt+1, discard the last part of columns of U ′′ and denotes it
as U∗ = U ′′(:, 1 : k).
7. At+1 = U∗T (A′ − μ′′1), Ut+1 = U ′U∗, μt+1 = μt + U ′μ′′
Since previous target information are well preserved in sample
coefficients(At), the calculation of subspace bases do not depend
on the storage of previous
826 G. Yu, Z. Hu, and H. Lu
samples(xi, i = 1, · · · , t). This greatly reduces the number of
memory storage. Besides, in order to reduce the impact of earliest
frames and increase the influ- ence of latest frames, the earliest
frame will be omitted if the number of frame coefficients we keep
is above a threshold(θ4). Also, the updating sample is not the
original one which may contain noises and outliers. We use the
images recon- structed based robust PCA coefficients. This is
feasible only in IPCA framework in which most of the operations are
based on sample coefficients.
3.3 Summary of Our Tracking Algorithm
The two major components, target location estimation and online
updating scheme, are seamlessly embedded into our algorithm
framework with occlusion handling scheme to increase the robustness
of our algorithm. To get a general idea of how our method works, a
summary of our tracking algorithm is depicted in Algorithm 2. The
first three steps in Algorithm 2 are similar with traditional
particle filter based algorithms except the introduction of the
results of Robust PCA in Eq. 7. The addition of the final two steps
increase the robustness of our algorithms with the help of
occlusion handling scheme.
Algorithm 2. Summary of Proposed Algorithm For each frame It:
1. Generate particle set {x(i) t }i=1:N with dynamic model
p(Xt|Xt−1) (Eq. 6)
2. Compute the weight of each particle with Eq. 7 3. Find the
particle with largest weight, marked it as xopt
t
4. Decide the target state according to occlusion handling scheme
and execute corre- sponding measures (Section 2.2)
5. If the target stays in Normal state, update the subspace with
IPCA(Algorithm 1)
4 Experimental Results
Numerous videos have been tested for our proposed algorithms. Due
to the limitation of paper length, only a compelling example is
illustrated here(Fig. 3). The first row shows the results of our
proposed algorithm. In order to illustrate the state transition of
our method, we draw the particle information in the second row.
When full occlusion happens in the fourth and fifth images, the
number of particles and the variance of particle state variables
are both increased. Once the target is relocated again, these
variables return to normal values showed in the sixth image. The
third row shows the tracking results of ISL, which fails when
occlusion happens. Some of the quantitive results are also given in
the following table, in which the first row shows the average
location error(pixels) of ISL and the second row is the result of
our method. The video and ground truth files are downloaded from
[9].
Robust Incremental Subspace Learning for Object Tracking 827
faceocc faceocc2 coke11 sylv tiger1 tiger2
42.5432 35.8175 31.1371 16.3283 40.3083 53.6643
12.0424 19.3204 27.7316 15.8316 34.5523 49.6800
Fig. 3. Tracking results of our robust method(the first row) and
ISL(the third row). The second row shows the particle information
of our method. The frame number is 106, 115, 117, 119, 126,
129.
5 Conclusion
We have presented a robust incremental subspace based object
tracking algo- rithm whose efficiency and robustness can be found
out in our experiments. The two major contributions of our method
are the occlusion handling scheme and the revised incremental PCA
algorithm. With the help of Robust PCA, the oc- clusion handling
scheme contributes a lot to the robustness of our method, which not
only successfully solve the occlusion problem but also can improve
the track- ing results in noisy and clutter scenes. On the other
hand, instead of using the traditional R-SVD based updating
methods, the incremental PCA algorithm gives more flexibility and
efficiency to our method.
Although the experiments show promising results for our method,
there are also several shortcomings needing to improve. The
tracking speed is still a com- mon problem related with subspace
based tracking algorithms. Besides, our method may fail when the
target object experiences fast out-of-plane movements or large
light variance. We aim to address these issues in our future
works.
Acknowledgement
This work was supported by the Open Project Program of the National
Labora- tory of Pattern Recognition (NLPR), 863 Program of China
(No. 2008AA02Z310) and NSFC (No. 60873133).
828 G. Yu, Z. Hu, and H. Lu
References
1. Isard, M., Blake, A.: Condensation: conditional density
propagation for visual tracking. International Journal of Computer
Vision 1, 5–28 (1998)
2. Avidan, S.: Ensemble tracking. In: Conference on Computer Vison
and Pattern Recognition, vol. 2, pp. 494–501 (2005)
3. Black, M.J., Jepson, A.D.: Eigentracking: Robust matching and
tracking of articu- lated objects using a view-based
representation. International Journal of Computer Vision 26, 63–84
(1998)
4. Lim, J., Ross, D., Lin, R.S., Yang, M.H.: Incremental learning
for visual tracking. Advances in Neural Information Processing
Systems 1, 793–800 (2004)
5. Leonardis, A., Bischof, H.: Robust recognition using
eigenimages. Computer Vision and Image Understanding 78, 99–118
(2000)
6. Skocaj, D., Leonardis, A.: Weighted and robust incremental
method for subspace learning. In: International Conference on
Computer Vision, vol. 2, pp. 1494–1501 (2003)
7. Levy, A., Lindenbaum, M.: Sequential Karhunen-Loeve Basis
Extraction and its Application to Images. IEEE Transactions on
Image processing 9, 1371–1374 (2000)
8. Ross, D.A., Lim, J., Lin, R.-s., Yang, M.-h., Lim, J., Yang,
M.-h.: Incremental Learning for Robust Visual Tracking.
International Journal of Computer Vision 77, 125–141 (2008)
9. http://vision.ucsd.edu/~bbabenko/project_miltrack.shtml
10. Tipping, M.E., Bishop, C.M.: Probabilistic Principal Component
Analysis. Journal of the Royal Statistical Society 61, 611–622
(1999)
11. Ho, J., Lee, K.C., Yang, M.H., Kriegman, D.: Visual Tracking
Using Learned Linear Subspaces. In: Computer Vision and Pattern
Recognition, pp. 782–789 (2004)
12. Zhang, X., Hu, W., Maybank, S., Li, X.: Graph Based
Discriminative Learning for Robust and Efficient Object Tracking.
In: International Conference on Computer Vision (2007)
13. Li, X., Hu, W., Zhang, Z., Zhang, X., Luo, G.: Robust Visual
Tracking Based on Incremental Tensor Subspace Learning. In:
International Conference on Computer Vision (2007)
14. Li, X., Hu, W., Zhang, Z., Zhang, X., Zhu, M., Cheng, J.:
Visual Tracking Via Incremental Log-Euclidean Riemannian Subspace
Learning. In: CVPR (2008)
15. Khan, Z., Balch, T., Dellaert, F.: A rao-blackwellized particle
filter for eigentrack- ing. In: CVPR, pp. 980–986 (2004)
16. Lin, R.-s., Ross, D., Lim, J., Yang, M.-h.: Adaptive
discriminative generative model and its applications. In: Advances
in Neural Information Processing Systems, pp. 801–808 (2004)
Introduction
Experimental Results
LOAD MORE