Robust Incremental Subspace Learning for Object Tracking

Object Tracking
MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, Department of Computer Science and Engineering,
Shanghai Jiao Tong University, Shanghai, 200240, China {skicy,huzhiwei,htlu}@sjtu.edu.cn
Abstract. In this paper, we introduce a novel incremental subspace based object tracking algorithm. The two major contributions of our work are the Robust PCA based occlusion handling scheme and revised incremental PCA algorithm. The occlusion handling scheme fully makes use of the merits of Robust PCA and achieves promising results in occlusion, clutter, noisy and other complex situations for the object tracking task. Besides, the introduction of incremental PCA facilitates the subspace updating process and possesses several benefits compared with traditional R-SVD based updating methods. The experiments show that our proposed algorithm is efficient and effective to cope with common object tracking tasks, especially with strong robustness due to the introduction of Robust PCA.
1 Introduction
During the past decades, object tracking is rapidly developed since it is widely used in many different areas, like surveillance, human computer interface, en- hanced reality and so forth. The accuracy and robustness of object tracking influence the performance of these applications. Therefore, object tracking has been a hot research area in computer vision and a lot of researches have been explored in this area. The main challenge of object tracking is the difficulty in handling the appearance variability of a target object. There are two categories of appearance variabilities, intrinsic and extrinsic appearance variabilities. The intrinsic appearance variabilities mainly include shape deformation and pose variation of a target object. On the other hand, changes in illumination, changes in viewpoint and partial occlusion belong to extrinsic variabilities. All the appearance variabilities pose great challenges to accurately locate the target object including the well-known methods [1,2]. However, subspace based methods, solving this problem by modeling such appearance variabilities in low-dimension space, prove to be efficient and effective in [3,4]. [3] first brought eigenspace to model appearance changes of target objects. The advantages of this subspace representation are several folds. Firstly, the subspace representation provides a compact notion of ”thing” to be tracked rather than ”stuff”, which means struc- ture information is fully utilized in the appearance representation. Besides, this
C.S. Leung, M. Lee, and J.H. Chan (Eds.): ICONIP 2009, Part I, LNCS 5863, pp. 819–828, 2009. c© Springer-Verlag Berlin Heidelberg 2009
820 G. Yu, Z. Hu, and H. Lu
method also survives in large appearance changes. But the needs to train the appearance model before starting the program and solve complex optimization problems limit the use of this method. Later, Lim et al improves this method in [4] by using R-SVD [7] to incrementally update the subspace and Particle Filter to replace the complex optimization steps. Due to the merits of stochastic methods, local minimum problem caused by deterministic optimization methods is well solved. Based on [3], [15] makes use of Rao-Blackwellized Particle Filter, achieving promising results in clutter environment. Lin et al[16] further optimizes the framework of [4] according to the idea of Fisher Discriminant Analysis(FDA). The import of the second subspace makes the method more discriminative since the utilization of background appearance. Meantime, Ho et al [11] replaces the traditional L2 reconstruction error norm with uniform L2 reconstruction error norm and achieves promising experimental results. Re- cently, Zhang et al [12] utilizes the framework of Graph Embedding and proposes a new discriminative subspace representation. Besides, Log-Euclidean Rieman- nian Subspace [14] and Tensor Subspace [13] are also brought in to handle the appearance variabilities. Although the theory parts and experimental results of these methods sound attractive, the overall framework is almost the same and similar with [4]. The possible differences lie on the subspace representation and corresponding R-SVD based updating algorithm. The import of Log-Euclidean and Tensor subspace strengthen the robustness and accuracy of tracking results. However, in the meantime, they also add additional complexities to the tracking framework, and the tracking speed may be influenced by the complicated subspace representations. Accordingly, the disadvantages may limit the wide use of these methods. Our method, on the other hand, avoids the complexity of elaborately subspace representation and adopts the traditional PCA-based representation. To obtain robust and accurate tracking performance, a Robust PCA [5] is utilized in our framework.
Two components are inevitable in subspace based methods. One is the subspace representation and the other is the algorithm to update the subspace incrementally. Although different subspaces are utilized to model the appearance variabilities, almost all the above methods update their corresponding subspace based on the R-SVD algorithm. In this paper, however, we will adopt a new incremental subspace updating algorithm, which possesses several beneficial advantages compared with R-SVD. Furthermore, according to the experiments of previous subspace methods, although occlusion may be handled well in the simple situations, the performance is deteriorated when the scenes become complex. On the other hand, if the subspace is updated when occlusion happens, outliers will bias the subspace and probably make the tracking results drift from the target region. In order to cope with these problems, a novel occlusion handling scheme is proposed in our paper. The main idea of this scheme is based on Robust PCA.
The rest of paper is structured as follows. Section 2 describes our subspace representation. Updating scheme(Incremental PCA) and our algorithm framework
Robust Incremental Subspace Learning for Object Tracking 821
are discussed in section 3. Section 4 gives some experimental results of our method and section 5 concludes this paper.
2 Robust PCA Based Learning
Principle Component Analysis is one of the traditional dimension reduction methods which has been widely used in computer vision group. Since PCA minimizes L2 reconstruction error, it is also considered as one of most successful reconstructive methods. By projecting a new sample into a pretrained subspace, the reconstruction error can be regarded as a useful tip for deciding whether the new sample is a kind of object that is similar with the training set. Hence, the intrinsic nature of PCA makes it practical in object tracking area. There are numerous works dedicated to make use of the merits of PCA in the object tracking programs. Some of fundamental and influential works are[3,4]. Although many works try to further improve the discriminability of subspace based methods, robustness is actually one of the essential problems currently which limits the wide application of subspace based methods. According to our experimentations, it can be easily found that the tracking methods lose their targets not because of lacking discriminative abilities but because of lacking robustness, especially in complex situations that occlusion and fast movement of target object happen. Thus, our method is proposed not to strengthen the discriminability of subspace based methods but to increase the robustness of PCA based methods.
2.1 Robust PCA
Let n be the number of images in the training set, each of which having m pixels. The training data set then can be represented by X = [x1,x2, · · · ,xn],xi ∈ R
m, where xi refers to the ith training image. We will use the notation U,U ∈ R
m×k
for the truncated eigen basis where k means the number of bases we keep. For the traditional PCA, U = [u1,u2, · · · ,uk],ui ∈ R
m is calculated by min- imizing the reconstruction error:
E = n∑
i=1
2
(1)
where aij = uj Txi. To solve Eq. 1, either Eigen Decomposition or SVD(Singular
Value Decomposition) can be used. The goal of reconstructive methods is to find ai once a new sample arrives. In traditional PCA, ai = UTxi. However, when there are outliers or noises in the image xi, the coefficients ai may be influenced by these contaminated pixels. Robust PCA, on the other hand, can limit the influences of outliers and noises and achieve robust results.
In the following part, a brief discussion of Robust PCA will be presented. For the detail information, we can refer to [5]. To achieve robustness, subsampling is employed in the calculation of coefficients a. The full process can be reviewed
as a hypothesize-and-select paradigm using only subsets of image pixels. There are two major steps, generating hypotheses and selection.
First of all, suppose U is calculated from a training data set, let us return to Eq. 1. Due to only subsets of pixels of a new image sample being considered, we need to seek the solution of a which minimizes
E(r) = q∑
i=1
2
(2)
where r = [r1, r2, · · · , rq], k < q < m refers to q points selected from m pixels in a new image x.
The minimization of Eq. 2 can be easily solved by least square. Then, in the first step of Robust PCA, several hypotheses are generated, each one referring to a subset of points(r). For each hypothesis, in each step of minimization, we get one temporary solution of coefficients a and the corresponding reconstruction error for each point(ξi = xi −
∑k j=1 ajuji). Through trimming part of the points
whose ξi are above a threshold, a new solution of a can be obtained with the trimmed set of points. This iterative step continues until the number of points in the hypothesis is below a predefined threshold. In the final hypothesis, a notion of compatible points is defined as follow:
D = {j|ξ2j < θ}, where θ = 2 m
n∑ i=k+1
λi (λi is eigen value) (3)
The cardinality of the compatible points set is denoted as s = |D| which can provide useful information for the selection step.
According to this method, several candidate hypotheses for a (each a repre- sents a potential coefficients vector computed from a subset of sample points with Eq. 2) are generated. The optimal one is selected to maximize the following function:
ci = K1si −K2||ξ||Di , where ||ξ||Di = ∑ j∈Di
ξ2j
where si and ||ξ||Di refer to the number of compatible points and the reconstruction error over the set Di(Di refers to a set of pixels from one image, from which the coefficients a is computed), and the coefficients K1 and K2 are parameters.
2.2 Occlusion Handle Scheme
In this subsection, a carefully designed occlusion handling scheme will be discussed. Though the scheme is mainly applied to deal with the occlusion situation, it is useful for some complex situations like out-of-plane rotation and clutter background as well.
For the tracked object, there will be three possible states for each frame. One is the normal state, meaning that all the conditions are normal and there is
Normal
T1
Fig. 1. State Transition Graph
no special arrangement for this state. The second kind of state is partial occlusion(POcclusion), which is mainly used for preventing the false updating of the subspace when occlusion happens. The last state is full occlusion(FOcclusion), in which we need to increase the particle number and state variance in order to relocate the target object. Besides, since the target may be fully occluded, we do not update the target position with the new estimation. For simplicity, we denote t as the frame number in the video. Fig. 1 is a simple description of the possible transitions for the three states.
At time frame t, the five transitions happen when certain requirements are met. T 1: γt > θ1 T 2: γt > kθ1 T 3: γt > kθ1 T 4: γt ≤ kθ1 T 5: δt < θ2
If none of these requirements are met, it means that the target keeps still in the original state. In the above requirements, θ1, θ2 are thresholds to decide whether current state is occluded, and k(k > 1) is a coefficient. γt and δt refer to the reconstruction difference based on the robust coefficients(at) and summation of these differences. They can be calculated as follows:
δt = ∑
αt−jγj γt = ||xt − Utat||2 (4)
where θ3 is the number of frames to consider and α is a forget factor(α = 0.9 in our experiments).
Intuitively, we can easily interpret these five transitions as below. When no occlusion happens, the robust reconstructed image will differ little from the original image, which means that the requirement of T1 cannot be met. However, once occlusion happens, γt will be certainly larger than θ1 and the target will fall to the partial occlusion state, which means the subspace should not be updated due to the noises and outliers in the new image samples. In the same time, we set a higher threshold kθ1 for indicator of full occlusion. When full occlusion happens, we keep the target object still in the last position until the full occlusion state is stopped. Besides, in order to locate the target object when the target appears again, the variance of state variable and number of particles are increased. Once we find a state meeting the requirement of T4, the variance of state variable and number of particles return to the original values. If the requirement of T5, the latest summation of reconstruction differences is below the threshold θ2, is met, it means that the target is no longer in occlusion state and the updating step can be started again.
Fig. 2. Tracking results based on occlusion handling scheme
We illustrate the state transitions in Fig. 2. There are two examples in Fig. 2. In the first row, the first and fifth images refer to the normal state. The second and fourth images represent the partial occlusion state. Full occlusion state is illustrated in third image. There are two subimages in each image, representing the target object in the frame and reconstructed target object with robust coefficients. It is obvious that updating with the reconstructed target object can prevent noises and outliers from biasing the subspace. The second row shows another successful example of our occlusion handling scheme.
We illustrate the state transitions in Fig. 2 with two examples. There are two subimages in each image, representing the target object in the frame and reconstructed target object with robust coefficients. It is obvious that updating with the reconstructed target object can prevent noises and outliers from biasing the subspace.
3 Proposed Tracking Algorithm
3.1 Overview of the Approach
The framework of our method is similar with [4]. There are two major components of the framework, locating the target region and updating the subspace. The visual tracking problem can be formulated as an inference problem based on Hidden Markov Model, where Xt and It refer to hidden state variable(target region) and observed variable(video frame) respectively. Let Xt = (xt, yt, θt, st, αt, φt), where xt, yt, θt, st, αt, φt denote x translation, y translation, rotation angle, scale, aspect ratio and skew direction at time t. According to Bayesian theorem, we have:
p(Xt|It) ∝ p(It|Xt) ∫ p(Xt|Xt−1)p(Xt−1|It−1)dXt−1 (5)
Due to the difficulties in directly calculating the posterior probability p(Xt|It), stochastic approximation methods like particle filter [1] are adopted to approx- imate the probability with a stochastically generated set of weighted samples. The dynamic model(p(Xt|Xt−1)) is modeled by a Gaussian distribution:
p(Xt|Xt−1) = N(Xt;Xt−1, ψ) (6)
To introduce probabilistic interpretation, the observation model is modeled with PPCA [10], which is widely used in subspace based object tracking methods. For simplicity, the observation model in our method is governed by a Gaussian distribution:
p(It|Xt) = N(It;μ,UUT + εI) ∝ exp(−||(It − μ) − Uat||2) (7)
where I is an identity matrix, ε is the additive Gaussian noise in the observation process, μ is the mean of training images and at is the coefficients ofXt outputted by Robust PCA. The detail proof can refer to [8].
3.2 Incremental PCA
Once a new target location is estimated, the subspace need to adapt to the new appearance change unless the target is predicted as occlusion in our occlusion handling scheme. Although almost all of previous works adopt the R-SVD [7] based incremental methods, we try to use a different incremental scheme(IPCA) [6]. The benefits are several folds. The first is of course the computation efficiency. Also, IPCA is more extendable and flexible due to the possibility to integrate spatial and temporal weights. Furthermore, IPCA can perfectly integrated with Robust PCA and our occlusion scheme and do not require any training images of the target object before the tracking task starts. The IPCA algorithm can be viewed in Algorithm. 1. For convenience, we suppose t > k in our algorithm framework. For t ≤ k, it is also easy to deduce from the algorithm.
Algorithm 1. Incremental PCA Input: Subspace eigen vectors Ut ∈ R
m×k, eigen value Dt ∈ R k, coefficients At ∈ R
k×t, mean vector μt ∈ R
m, new input image xt+1 ∈ R m
Output:Ut+1, Dt+1, At+1, μt+1
1. Get the Robust coefficients of xt+1 on current subspace: a = RobustPCA(Ut, Dt, μt) with Eq. 2
2. Reconstruct the image: y = Uta + μt
3. Calculate reconstruction error r ∈ R m : r = xt+1 − y
4. Form new basis vector: U ′ = [ Ut r
||r|| ]
[ A′
At(:, 2 : t) if t ≥ θ4
6. Perform PCA on A′ obtaining mean value μ′′, eigenvectors U ′′ and Dt+1, discard the last part of columns of U ′′ and denotes it as U∗ = U ′′(:, 1 : k).
7. At+1 = U∗T (A′ − μ′′1), Ut+1 = U ′U∗, μt+1 = μt + U ′μ′′
Since previous target information are well preserved in sample coefficients(At), the calculation of subspace bases do not depend on the storage of previous
samples(xi, i = 1, · · · , t). This greatly reduces the number of memory storage. Besides, in order to reduce the impact of earliest frames and increase the influence of latest frames, the earliest frame will be omitted if the number of frame coefficients we keep is above a threshold(θ4). Also, the updating sample is not the original one which may contain noises and outliers. We use the images reconstructed based robust PCA coefficients. This is feasible only in IPCA framework in which most of the operations are based on sample coefficients.
3.3 Summary of Our Tracking Algorithm
The two major components, target location estimation and online updating scheme, are seamlessly embedded into our algorithm framework with occlusion handling scheme to increase the robustness of our algorithm. To get a general idea of how our method works, a summary of our tracking algorithm is depicted in Algorithm 2. The first three steps in Algorithm 2 are similar with traditional particle filter based algorithms except the introduction of the results of Robust PCA in Eq. 7. The addition of the final two steps increase the robustness of our algorithms with the help of occlusion handling scheme.
Algorithm 2. Summary of Proposed Algorithm For each frame It:
1. Generate particle set {x(i) t }i=1:N with dynamic model p(Xt|Xt−1) (Eq. 6)
2. Compute the weight of each particle with Eq. 7 3. Find the particle with largest weight, marked it as xopt
t
4. Decide the target state according to occlusion handling scheme and execute corresponding measures (Section 2.2)
5. If the target stays in Normal state, update the subspace with IPCA(Algorithm 1)
4 Experimental Results
Numerous videos have been tested for our proposed algorithms. Due to the limitation of paper length, only a compelling example is illustrated here(Fig. 3). The first row shows the results of our proposed algorithm. In order to illustrate the state transition of our method, we draw the particle information in the second row. When full occlusion happens in the fourth and fifth images, the number of particles and the variance of particle state variables are both increased. Once the target is relocated again, these variables return to normal values showed in the sixth image. The third row shows the tracking results of ISL, which fails when occlusion happens. Some of the quantitive results are also given in the following table, in which the first row shows the average location error(pixels) of ISL and the second row is the result of our method. The video and ground truth files are downloaded from [9].
faceocc faceocc2 coke11 sylv tiger1 tiger2
42.5432 35.8175 31.1371 16.3283 40.3083 53.6643
12.0424 19.3204 27.7316 15.8316 34.5523 49.6800
Fig. 3. Tracking results of our robust method(the first row) and ISL(the third row). The second row shows the particle information of our method. The frame number is 106, 115, 117, 119, 126, 129.
5 Conclusion
We have presented a robust incremental subspace based object tracking algorithm whose efficiency and robustness can be found out in our experiments. The two major contributions of our method are the occlusion handling scheme and the revised incremental PCA algorithm. With the help of Robust PCA, the occlusion handling scheme contributes a lot to the robustness of our method, which not only successfully solve the occlusion problem but also can improve the tracking results in noisy and clutter scenes. On the other hand, instead of using the traditional R-SVD based updating methods, the incremental PCA algorithm gives more flexibility and efficiency to our method.
Although the experiments show promising results for our method, there are also several shortcomings needing to improve. The tracking speed is still a common problem related with subspace based tracking algorithms. Besides, our method may fail when the target object experiences fast out-of-plane movements or large light variance. We aim to address these issues in our future works.
Acknowledgement
This work was supported by the Open Project Program of the National Labora- tory of Pattern Recognition (NLPR), 863 Program of China (No. 2008AA02Z310) and NSFC (No. 60873133).
References
1. Isard, M., Blake, A.: Condensation: conditional density propagation for visual tracking. International Journal of Computer Vision 1, 5–28 (1998)
2. Avidan, S.: Ensemble tracking. In: Conference on Computer Vison and Pattern Recognition, vol. 2, pp. 494–501 (2005)
3. Black, M.J., Jepson, A.D.: Eigentracking: Robust matching and tracking of articu- lated objects using a view-based representation. International Journal of Computer Vision 26, 63–84 (1998)
4. Lim, J., Ross, D., Lin, R.S., Yang, M.H.: Incremental learning for visual tracking. Advances in Neural Information Processing Systems 1, 793–800 (2004)
5. Leonardis, A., Bischof, H.: Robust recognition using eigenimages. Computer Vision and Image Understanding 78, 99–118 (2000)
6. Skocaj, D., Leonardis, A.: Weighted and robust incremental method for subspace learning. In: International Conference on Computer Vision, vol. 2, pp. 1494–1501 (2003)
7. Levy, A., Lindenbaum, M.: Sequential Karhunen-Loeve Basis Extraction and its Application to Images. IEEE Transactions on Image processing 9, 1371–1374 (2000)
8. Ross, D.A., Lim, J., Lin, R.-s., Yang, M.-h., Lim, J., Yang, M.-h.: Incremental Learning for Robust Visual Tracking. International Journal of Computer Vision 77, 125–141 (2008)
9. http://vision.ucsd.edu/~bbabenko/project_miltrack.shtml
10. Tipping, M.E., Bishop, C.M.: Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society 61, 611–622 (1999)
11. Ho, J., Lee, K.C., Yang, M.H., Kriegman, D.: Visual Tracking Using Learned Linear Subspaces. In: Computer Vision and Pattern Recognition, pp. 782–789 (2004)
12. Zhang, X., Hu, W., Maybank, S., Li, X.: Graph Based Discriminative Learning for Robust and Efficient Object Tracking. In: International Conference on Computer Vision (2007)
13. Li, X., Hu, W., Zhang, Z., Zhang, X., Luo, G.: Robust Visual Tracking Based on Incremental Tensor Subspace Learning. In: International Conference on Computer Vision (2007)
14. Li, X., Hu, W., Zhang, Z., Zhang, X., Zhu, M., Cheng, J.: Visual Tracking Via Incremental Log-Euclidean Riemannian Subspace Learning. In: CVPR (2008)
15. Khan, Z., Balch, T., Dellaert, F.: A rao-blackwellized particle filter for eigentracking. In: CVPR, pp. 980–986 (2004)
16. Lin, R.-s., Ross, D., Lim, J., Yang, M.-h.: Adaptive discriminative generative model and its applications. In: Advances in Neural Information Processing Systems, pp. 801–808 (2004)
Introduction
Experimental Results

Documents

Robust Incremental Subspace Learning for Object Tracking