Recovering shape and motion by a dynamic system for low-rank matrix approximation in L1 norm

Vis Comput (2013) 29:421–431DOI 10.1007/s00371-012-0745-5

O R I G I NA L A RT I C L E

Recovering shape and motion by a dynamic system for low-rankmatrix approximation in L1 norm

Yiguang Liu · Liping Cao · Chunling Liu · Yifei Pu ·Hong Cheng

Published online: 3 July 2012© Springer-Verlag 2012

Abstract To recover motion and shape matrices from a ma-trix of tracking feature points on a rigid object under or-thography, we can do low-rank matrix approximation of thetracking matrix with its each column minus the row meanvector of the matrix. To obtain the row mean vector, usually4-rank matrix approximation is used to recover the miss-ing entries. Then, 3-rank matrix approximation is used torecover the shape and motion. Obviously, the procedure isnot convenient. In this paper, we build a cost function whichcalculates the shape matrix, motion matrix as well as the rowmean vector at the same time. The function is in L1 norm,and is not smooth everywhere. To optimize the function, acontinuous-time dynamic system is newly proposed. Withtime going on, the product of the shape and rotation ma-trices becomes closer and closer, in L1-norm sense, to thetracking matrix with each its column minus the mean vec-tor. A parameter is implanted into the system for improvingthe calculating efficiency, and the influence of the parameter

Y. Liu (�) · Y. PuVision and Image Processing Lab., College of Computer Science,Sichuan University, Chengdu 610065, Sichuan, Chinae-mail: [email protected]

L. CaoLibrary, Sichuan University, Chengdu 610065, Sichuan, Chinae-mail: [email protected]

C. LiuDepartment of Ophthalmology, West China Hospital,Sichuan University, Chengdu 610041, Sichuan, Chinae-mail: [email protected]

H. ChengPattern Recognition and Machine Intelligence Lab., Universityof Electronic Science and Technology of China,Chengdu 610054, Sichuan, Chinae-mail: [email protected]

on approximation accuracy and computational efficiency aretheoretically studied and experimentally confirmed. The ex-perimental results on a large number of synthetic data and areal application of structure from motion demonstrate the ef-fectiveness and efficiency of the proposed method. The pro-posed system is also applicable to general low-rank matrixapproximation in L1 norm, and this is also experimentallydemonstrated.

Keywords Structure from motion · Low-rank matrixapproximation · Dynamic system · L1 norm · Convergence

1 Introduction

The goal of the factorization method for recovering shapeand motion from image streams under orthography is totackle the following optimization problem [1]:

minR,S

∥∥W � (

Y − RS − U(1)1×n

)∥∥ (1)

where � denotes the Hadamard product, and Y = (yij )m×n

is the measurement matrix. Each column of Y consists of thex- and y-coordinates of a tracked feature point in all frames,and thus m is twofold number of the frames; and each rowof Y represents x- or y-coordinates of all tracked featurepoints in a frame, and thus n denotes the number of trackedpoints. The ith entry of the vector U ∈ Rm×1 denotes themean value of the ith row of Y, and Y − U(1)1×n is actuallythe registered measurement matrix. The matrix R ∈ Rm×r isthe rotation matrix, whose rows represent the orientations ofthe horizontal and vertical camera reference axes throughoutthe stream. Each column of the shape matrix, S, is the 3D co-ordinates of one feature point with respect to the centroid ofall tracked feature points. The entry wij in W ∈ Rm×n takes1 if yij is known, and takes 0 otherwise. Low-rank matrix

mailto:[email protected]




422 Y. Liu et al.

approximations, as described in (1), have been used in manycomputer vision tasks, such as 3D Euclidean reconstructionof nonrigid objects [2], nonrigid structure from motion [3],robust alignment [4], robust principal component analysis[5], etc. Combing low-rank approximation with SVM- orsubspace-based techniques [6, 7], novel pattern recognitiontechnologies possibly arise.

In (1), any “entrywise” norm can be used as ‖.‖, whichtreats an m × n matrix as a vector of size mn. If ‖.‖ takesL2 norm, and all measurements in Y are known (thus U isknown), the optimization problem (1) can be well solved bySingular Value Decomposition (SVD); however, SVD can-not tackle (1) directly when some yij are missing. To dealwith this issue, Morita and Kanade [8] proposed the sequen-tial factorization method; Hartley and Schaffalitzky [9] pro-posed the power factorization (PF) which is derived fromthe power method in matrix computation and the sequen-tial factorization method proposed in [8]; Wang and Wu[2] proposed a constrained PF algorithm. Buchanan andFitzgibbon [10, 11] introduced the Damped Newton algo-rithm along with an outstanding survey of many of the meth-ods for least L2-norm matrix factorization; GLRAM (Gen-eralized Low Rank Approximations of Matrices) was firstproposed in [12], and was revisited by [13]. The Wiberg al-gorithm was first proposed more than 30 years ago, whichaims at extracting the principal components when some dataare missing [14]. Using the Wiberg algorithm to solve low-rank matrix approximations in L2 norm was first introducedby T. Okatani [15], and it was shown that on many prob-lems the Wiberg method outperforms many of the existingmethods. Chen [16] adapted the Wiberg algorithm to het-eroscedastic low-rank matrix approximation, and revitalizedthe Levenberg–Marquardt algorithm for solving low-rankmatrix approximation with missing data [17]. Using nuclearnorm or the combination of nuclear norm and L1 norm, Linet al. [18] and Cai et al. [19] addressed matrix completionby virtue of convex optimization.

Taking ‖.‖ in (1) as L2-norm is justified only whenthe measurement noise in Y is negligible [3]. With non-negligible noise, the results obtained by the low-rank ma-trix approximation using L2 norm give no guarantee of sta-tistical optimality, and may be highly biased in the com-puter vision applications such as three-dimensional recon-structions [1, 8]. Besides, L1 norm is much more robust tooutliers than L2 norm, as revealed by [20]. Thus, the low-rank approximation of matrices under L1 norm has been at-tracting more and more attention in recent years [5, 20, 21].Wright, Peng as well as Ma [5] introduced the robust prin-cipal component analysis, where L1 norm is used to recovercorrupted low-rank matrices. Ke and Kanade [20] presentedthe alternative convex programming method to solve the L1-norm minimization problem formulated from low-rank ma-trix factorization. Eriksson and Hengel [21] experimentally

revealed that the alternated convex programming approachoften converges to a point that is not a local minimum (typi-cally, the evolution of the L1 cost function stops after only asmall number of iterations), and introduced the L1-Wibergapproach, which is the state-of-the-art. In mathematics, a dy-namic system consists of a state vector and an evolution rulewhich describes what future state vector will follow fromthe current state vector. Dynamic systems have been usedin computer vision community for a long time [22–25]. Forexample, the structure-from-motion problem can be consid-ered as the state estimation of dynamical systems [26]. Dy-namic systems have also been applied successfully to solveeigen-problems of matrices [27, 28].

In low-rank matrix approximations, because L1-normcost function is not smooth everywhere and non-convex,many of the traditional optimization techniques suitable forL2 cost function cannot be applicable. To perform low-rankmatrix approximation in L1 norm, In this paper we intend todesign a dynamic system to tackle the following problem:

minR,S,U

∥∥W � (

Y − RS − U(1)1×n

)∥∥

1 (2)

Compared to the known techniques specially designed forlow-rank matrix approximation in L1 norm [20, 21], whoseconvergence to a local minimum is not theoretically ensured,we prove and experimentally demonstrate that our dynamicsystem can get an optimal solution of (2).

The remainder of this paper is organized as follows. InSect. 2, the dynamic system is proposed together with theanalysis of its properties, which reveal that the dynamicsystem can get the optimal low-rank matrix approximation.Then, in Sect. 3, we experimentally demonstrate the perfor-mance of the calculating model, followed by some conclud-ing remarks in Sect. 4.

2 The dynamic system and its properties

Let Cij denote a matrix Cij ∈ R(m+n)r×(m+n)r with the sub-matrix from rows ir + 1 to (i + 1)r and from columns mr +jr + 1 to mr + (j + 1)r being an identity matrix, and

Hij �[

Cij (0)(m+n)r+m

(0)(m+n)r+m (0)m×m

]

.

The vectorization of an m×n matrix A, denoted by vec(A),is the mn × 1 column vector obtained by stacking thecolumns of the matrix A on top of one another. For the con-venience of presentation, the problem (2) is equivalently re-formulated as follows:

minx

V (x), V (x) �m

∑

i=1

n∑

j=1

∣∣fij (x)

∣∣

fij (x) � wij

(

yij − xT Hij x − lTi x)

, (3)

x �[

vecT(

RT)

vecT (S)UT]T

Recovering shape and motion by a dynamic system for low-rank matrix approximation in L1 norm 423

where li is an ((m + n)r + m)-dimensional vector whose((m + n)r + i)th entry is 1 and the other entries are all 0.Actually, the terms, xT Hij x and lTi x, in (3) correspond tothe (i, j) entry of the product matrix RS and to the (i,1)

entry of U, respectively.In (3), we can see the cost function is not convex and

not smooth everywhere with respect to x. Thus, the tech-niques based on gradient are not directly applicable thoughthey are often used in computer vision community [22, 29].It is also impossible to transform (3) into a standard linear orquadratic mathematical programming problem, and thus themethods applicable to mathematical programming problemsare still not suitable for solving (3). In this paper, we addressthe problem of optimizing (3) by dynamic systems. First wepropose a dynamic system for optimizing (3) as follows:⎧

⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

x(t) =m

∑

i=1

n∑

j=1

sgn(

fij

(

x(t)))

gij (x(t)),

gij (x(t)) � wij

((

Hij + HTij

)

x(t) + li)

, t > 0;x(t) = x0, t = 0

(4)

where sgn(x) is the signum function: sgn(x) = 1 when x >

0, sgn(x) = 0 when x = 0, and sgn(x) = −1 when x < 0. If(4) has the capability of optimizing (3), then with the evolu-tion of (4), from an initial vector x0 the vector x(t) will go toa state vector making V (x) locally minimal. To theoreticallymake sure the capability, we prove the following propositionin Appendix A.

Proposition 1 The cost function, V (x(t)), decreases withthe evolution of the dynamic system (4) from an initial vectorx0, and finally stops at a local minimum.

Although (4) can make sure the convergence of V (x(t))

to a local minimum in terms of Proposition 1, the problemof computational efficiency is still not considered. It is im-possible to get the analytic solution of (4), and solving (4)need resort to numerical computation in general. In numeri-cal computation, the time step size, δt , can be fixed or vari-able throughout the calculating procedure. Compared to thefixed-step strategy, the variable-step trick is time-saving be-cause for a given level of accuracy, adjusting the time stepsize dynamically as necessary can reduce the number ofsteps. Whereas the fixed-step strategy must use a single stepsize in terms of the accuracy requirements which may leavethe size very small. Thus, variable-step trick is often pre-ferred, and the time step size varies dynamically based onthe local error. In numerically solving (4) with variable stepsize, the time step size, δt , at a time t is usually related tothe Jacobian matrix of the right hand side of (4), and δt willtake smaller values if greater changes have taken place inthe right hand side. At zero point, the function, sgn(.) is notsmooth, and has infinite derivative, which will make δt in-finitely small at the position. Thus, the non-smooth function,

Fig. 1 The hyperbolic tangent function, tanh(σx), is close to sgn(x)

when σ is large enough

sgn(.), may allow calculating time to dramatically improvein numerically solving (4).

In real applications, such as recovering shape and motionfrom image streams [8], 3D Euclidean reconstruction [2],robust alignment [4], and robust principal component analy-sis [30], etc., the size of the measurement matrix Y is usuallylarge, which allows the dimension of x to be high. For ex-ample, in the Oxford dinosaur data set which is often usedin vision community [10, 31–33], Y is a 72 × 319 matrixas used in [21], and it allows the dimension of x to attain(m + n)r + m = 1245. The low efficiency due to sgn(.)

can leave (4) unsuitable for these large-scale problems. Toremove the ill influence of sgn(.) on calculating efficiencyof (4), we use the hyperbolic tangent function, tanh(σx), totake the place of sgn(x). The function tanh(σx) is smooth,and becomes close to sgn(x) with σ > 0 large enough. Asshown in Fig. 1, tanh(σx) with σ > 35 seems very close tosgn(x). Replacing sgn(x) with tanh(σx), from (4) we have⎧

⎪⎨

⎪⎩

x(t) =m

∑

i=1

n∑

j=1

tanh(

σfij

(

x(t)))

gij

(

x(t))

, t > 0;

x(t) = x0, t = 0.

(5)

The dynamic system (5) comes from (4), and the parame-ter, σ , affects the performance of (5) on optimizing V (x).Intuitively, larger σ makes (5) closer to (4); and smaller σ

gives the reverse effect while reducing the calculating timebecause tanh(σx) changes more smoothly at zero point. Toshow the influence of σ on calculating time directly, let usfix Y and W (20 % of its entries are set to zero) as 50 × 100matrices, r = 3 and a initial vector x0 with suitable dimen-sions. With σ taking values in {0.3,0.5,1.0,1.5,3.0,5.0},we test the calculating time of (5). The experiments are re-peated for five times, and the relation of calculating time vs.σ is shown in Fig. 2. Figure 2 demonstrates that the calculat-ing time of (5) increases with σ on the whole though there isthe case where the calculating time corresponding to a largerσ is a little smaller.

424 Y. Liu et al.

Fig. 2 The calculating time of (5) increases with σ in general. Eachbroken line is associated to a set of W, Y as well as x0

Fig. 3 The influence of σ on the convergence of V (x(t)). With a fixedset of W, Y as well as x0, with σ increasing, the time at which V (x(t))

gets to the minimal value deduces accordingly

Albeit the calculating time is saved when using (5) in-stead of (4), after all there is difference between (4) and (5),and the difference will affect the convergence of V (x(t)).To address the influence of σ in (5) on the convergenceof V (x(t)), in Appendix B we have proven the followingproposition.

Proposition 2 With the increase of σ , the probability, inwhich V (x(t)) decreases with the evolution of (5), increases.

Proposition 2 indicates that increasing σ can enhancethe convergence speed of (5) because V (x(t)) will decreasein more occasions. To intuitively show the influence of σ

on the convergence of V (x(t)), the convergence tracks ofV (x(t)) under different σ are shown in Fig. 3 for a setof fixed Y and W with 50 × 100 dimension and an ini-tial x0. Figure 3 shows that under different σ , V (x(t)) can

attain the minimum value under (5); however, with σ in-creasing, the time at which V (x(t)) gets to the minimum,decreases accordingly. Of course, the calculating time doesnot like this, and increases accordingly, as shown in Fig. 2.Compared to (4), tanh(σfij (x(t))) in (5) corresponds tosgn(fij (x(t))). The larger the value of σfij (x(t)), the closertanh(σfij (x(t))) is to sgn(fij (x(t))), and the stronger thecapability of (5) is on optimizing V (x(t)). As defined in (3),|fij (x(t))| with larger m and n is usually larger than theone with smaller m and n. Thus, for a given σ , with thesize of Y increasing, tanh(σfij (x(t))) becomes closer tosgn(fij (x(t))) in general.

Proposition 2 indicates there is the possibility that V (x(t))

will increase with the evolution of (5); however, does theinstance exist where V (x(t)) keeps on increasing? Aboutthe issue we have proved the following proposition in Ap-pendix C.

Proposition 3 With the evolution of (5), V (x(t)) does notkeep on increasing, and does fall when σ is large enough,on the whole.

Proposition 3 guarantees that the dynamic system (5) candecrease V (x(t)) on the whole with an appropriate σ , andthe instance where V (x(t)) keeps on increasing does not ex-ist. As shown in Fig. 3, the cost function V (x(t)) decreaseson the whole although the decreasing speeds are differentfor different σ . In Fig. 3, there is not the phenomenon thatV (x(t)) increases with the evolution of (5), and in manyother simulations we also cannot see the phenomenon. Thusthe probability, in which V (x(t)) increases with the evolu-tion of (5), seems small from empirical results. This willbe further confirmed in the following section. CombiningPropositions 2 and 3 tells that (5) with larger σ is more suit-able for optimizing V (x); however, this may consume morecalculating time. Thus the value of σ is dependent on thetradeoff between the capability on optimizing V (x(t)) andthe calculating time.

3 Experimental results

In this section, experiments were done to evaluate the perfor-mance of the proposed approach. The experiments consist ofthree parts: one uses synthetic data, one is an application in-stance in computer vision, and the goal of the final one isto illustrate that the proposed method can be used for gen-eral L1-norm low-rank matrix approximation. The proposedmodel (5) considers the bias term (corresponding to lTi x),and this consideration was hardly taken apart from the L2-Wiberg algorithm [15]. Thus here we first compare our al-gorithm against L2-Wiberg with bias terms (see (5) in [15]).L1-Wiberg algorithm [21] is state-of-the-art in L1-normlow-rank matrix approximation (while, in other norms such


as nuclear norm, the method possibly is not the state-of-the-art), and its performance was experimentally revealed to bemuch better than that of the two algorithms (i.e., alternatedlinear programming approach and alternated quadratic pro-gramming approach) proposed in [20], and thus L1-Wibergalgorithm also joins the comparison experiments. The im-plementations of L2-Wiberg and L1-Wiberg were given bythe authors of [15] and [21], and can be downloaded fromhttp://www.fractal.is.tohoku.ac.jp/okatani/code/Wiberg.zipand http://cs.adelaide.edu.au/~anders/code/cvpr2010.html,respectively. To experimentally demonstrate the efficiencyof the proposed method against that of the matrix com-pletion methods founded on nuclear norm, OPTSPACE[34] and NNLS [35] were also employed in the numeri-cal experiment, and their implementations are downloadedat http://www.stanford.edu/~raghuram/optspace/code.htmland http://www.math.nus.edu.sg/~mattohkc/NNLS.html, re-spectively. The dynamic system (5) was implemented byCVODES, which is a solver for stiff and non-stiffODE systems and runs in Matlab, and is downloadedfrom https://computation.llnl.gov/casc/sundials/download/download.html. Mean Absolute Error (MAE) and RootMean Squared Error (RMSE) are suitable for calibratingthe accuracies of low-rank matrix approximations in L1 andL2 norm, respectively. Thus, in the experiments MAE andRMSE were given in order to fairly perform comparisons.

3.1 Synthetic data

The measurement matrix Y is obtained by remaining thelargest r singular values in the “middle matrix” of the SVDof an m × n matrix, which is drawn from a uniform dis-tribution between [−1,1]. Then 20 % of the entries of Yare randomly chosen to be missing, and the missed entriesare indicated by setting the corresponding wij to be zero.To simulate outliers, uniformly distributed noises between[−5 5] are added to 10 % of the entries of Y; and to sim-ulate the bias term, such as af and bf introduced in [1],we add Y to a matrix, whose each row has the same value.We take m = 15, n = 30 and r = 3 when generating W andY. Because (5) and L2-Wiberg have taken into account thebias term, r keeps unchanged for the both algorithms. TheL1-Wiberg algorithm proposed in [21] does not consider thebias term, thus r takes 4 since the Y matrix produced asabove can be seen as a matrix with rank 4 when not consid-ering the outliers.1

Like [21], the experiment was repeated for 100 times,with different Y but the same initialization scheme. In eachtime, the initial vector for (5) is randomly produced. The

1When neglecting outliers, we have Y = Pm×3Q3×n + (1)m×1U1×n =[Pm×3, (1)m×1][QT

3×n,UT1×n]T .

parameters, σ and the end time of t , denoted as tf , re-main the same at each time, 25 and 5, respectively. The re-sults as well as the comparisons were shown in Fig. 4. Fig-ure 4(a) discloses that the implementation of L1-Wiberg isevidently the most time-consuming, while L2-Wiberg is themost time-saving. The time values are closely related to theconcrete implementations of the comparing algorithms, anddo not reflect the computational complexity exactly; how-ever, Fig. 4(a) shows that the implementation of (5) has com-petitive computational efficiency. In Fig. 4(b), we can seethe RMSE of L2-Wiberg is the lowest, which is in accor-dance with the actuality that L2-Wiberg aims at minimizingRMSE. As shown in Fig. 4(c), the MAE of L1-Wiberg andour algorithm is obviously lower than that of L2-Wiberg.On MAE or RMSE, it seems that Figs. 4(b) or 4(c) cannotreveal which one is better between our algorithm and L1-Wiberg. Because here what we are interested in is to evalu-ate the L1-, not L2-, norm low-rank matrix approximation,we did histogram statics of MAE in Fig. 4(d), which showsthat (5) is close to L1-wiberg, and (5) seems a little better. Inall experiments, V (x(t)) calculated by (5) is adjacent to theones calculated by L1-Wiberg, and the case, where V (x(t))

goes to infinity, does not arise. This confirms again that (5)has a good capability on optimizing the L1-norm functionV (x(t)).

To quantitatively and statistically compare the perfor-mance of the three algorithms, the paired t-test compar-ing (5) to L1- or L2-Wiberg algorithm was performed onthe MAE and RSME of 100 repeated experiments. In a t-test, the p-value is used to evaluate whether the two distribu-tions are close. If p-value is below the threshold chosen forstatistical significance level (usually 5.00 %), then the nullhypothesis is rejected in favor of an alternative hypothesis,which typically states that the comparing two distributionsdo differ. The mean execution time, mean MAE and RSMEalong with the p-values were listed in Table 1, from whichwe can see the p-value comparing the MAE of (5) to thatof L1-Wiberg attains 11.23 % (larger than 5.00 %), in addi-tion to that the average MAE values of the both algorithmsare 0.2575 and 0.2618, respectively, we can see our algo-rithm (5) is competitive in optimizing the L1-norm functionof low-rank matrix approximation. The other p-values areall close to zero, implying that RMSE of (5) do differ fromthat of the other two algorithms. The average execution timevalues listed in Table 1 confirm again that the implementa-tion of (5) is more time-saving than the implementation ofL1-Wiberg [21].

3.2 Structure from motion

Low-rank approximation of matrices has been used success-fully in structure from motion [3]. Like [21], we use the im-age sequence called the Oxford Dinosaur data set (available

http://www.fractal.is.tohoku.ac.jp/okatani/code/Wiberg.zip

http://cs.adelaide.edu.au/~anders/code/cvpr2010.html

http://www.stanford.edu/~raghuram/optspace/code.html

http://www.math.nus.edu.sg/~mattohkc/NNLS.html

https://computation.llnl.gov/casc/sundials/download/download.html

https://computation.llnl.gov/casc/sundials/download/download.html

426 Y. Liu et al.

Fig. 4 The performance of (5) on L1-norm low-rank matrix approximations is competitive against the algorithms: L1- and L2-Wiberg

Table 1 The statistical comparison of the results produced by (5), L1-Wiberg and L2-Wiberg, respectively

DS (5) L1-Wiberg [21] L2-Wiberg [15]

p-Value of MAE 11.23 % 0.00 %

p-Value of RMSE 0.00 % 0.00 %

Average MAE 0.2575 0.2618 0.4057

Average RMSE 0.8738 0.8296 0.6178

Average execution time 23.8 s 111.3 s 0.5 s

Note: The parameter σ takes 25 and the end time tf takes 5 in (5)

at http://www.robots.ox.ac.uk/~vgg/data/data-mview.html)to illustrate this application. Dinosaur sequence consists of36 images with the resolution of 720 × 576 that are taken

from an artificial dinosaur on a turntable, and three imagesof the sequence are shown in Fig. 5. Structure from mo-tion can be fulfilled only when the 2D track of a 3D pointcontains enough inliers. Like [20], in our experiment, werequire a valid 3D point contain at least 5 inliers in its 2Dtrack. The used measurement matrix of Dinosaur sequenceis a 72 × 319 matrix as used in [15, 21], and is availableat http://www.robots.ox.ac.uk/~amb/. The primary tracks ofthe 319 points are shown in Fig. 6(a), which clearly demon-strates that the track of each point only has a short interval,and the whole track needs to be recovered by low-rank ma-trix approximation.

Because OPTSPACE [34], NNLS [35] as well as L2-Wiberg [15] are all founded on nuclear or L2 norm andcan be solved by convex optimization techniques, first weuse the three methods to perform low-rank approxima-

http://www.robots.ox.ac.uk/~vgg/data/data-mview.html

http://www.robots.ox.ac.uk/~amb/


Fig. 5 Some views of 36-frame dinosaur sequence

Fig. 6 The tracks computed by L2-Wiberg [15] as well as (5). All the tracks seem in accordance with the circle movement

tions on the measurement matrix with the initialization ran-domly produced. In experiments, we found that the con-vergence performance of OPTSPACE [34] and NNLS [35]

seems weaker than that of L2-Wiberg [15], thus the itera-tion numbers of OPTSPACE and NNLS take larger num-bers, 10000 and 5000, respectively, against that of L2-

428 Y. Liu et al.

Table 2 The calculating accuracy and efficiency comparison

OPTSPACE[34]

NNLS[35]

L2-Wiberg[15]

(5) with σ = 3, tf = 5initialized by L2-Wiberg

(5) with σ = 3, tf = 13randomly initialized

L1-Wiberg[21]

MAE 2.71 0.80 0.54 0.51 0.43 NA

RMS 4.16 1.60 1.27 1.31 1.53 NA

Time 9 m 43 s 6 m 22 s 2 m 12 s 30 m 7 s 1 h 43 m 7 s NA

Note: ‘NA’ denotes ‘Not Available’. Memory overflow fails L1-Wiberg on the matrix 72 × 319. Iteration numbers of L2-Wiberg, OPTSPACE andNNLS take 102, 104, 5 × 103, respectively

Fig. 7 Shapes recovered by all algorithms

Wiberg, 100. The experimental results show that the ac-curacy of L2-Wiberg seems obviously higher, and the re-covered tracks of the method were shown in Fig. 6(b).Then with the R, S and U just calculated by L2-Wibergbeing the initialization, we impose L1-Wiberg and (5) onthe measurement matrix. When imposing L1-Wiberg onthe 72 × 319 matrix, ∂A

∂U is a (4mn2r + 6m2n2) × mn2r

(3.2824 × 109 × 29307168) matrix (see [21] for detail),which leads to memory overflow even in sparse storage or-ganization as implemented by the founder of this algorithm(see http://cs.adelaide.edu.au/~anders/code/cvpr2010.htmlfor details). The tracks recovered by (5) were shown inFig. 6(c). Figure 6(d) shows the recovered tracks of (5)with an random initialization. Comparing Figs. 6(b), 6(c) aswell as 6(d) to Fig. 6(a) implies that all the recovered tracksseems in accordance with the real tracks exactly. Thoughthe tracks seem the same, the shapes recovered by the algo-rithms are different. As shown in Fig. 7, the shapes, recov-ered by all algorithms, have correctly disclosed the outlineof the dinosaur toy though the recovered points for the same3D point sometimes are different.

The accuracy and efficiency of all algorithms are listed inTable 2. From Table 2 we can see, the three matrix comple-tion algorithms, OPTSPACE, NNLS and L2-Wiberg, have

good efficiency because they can be solved by convex pro-gramming techniques. With the given iteration numbers, L2-Wiberg is the most time-saving and has the best accuracy inthe three algorithms. The calculating time of (5) is relatedto the precondition for terminating the solving procedureof (5), and here we simply use the end time tf to controlit. Compared to the implementation of L1-Wiberg, whichneeds larger memory storage, the memory consumption ofthe implementation of (5) is mild.

3.3 General L1-norm low-rank matrix approximation

If li in (3) takes 0 for all i, the goal of (5) is degradedto find R and S making ‖W � (Y − RS)‖1 minimal.Thus, (5) is applicable to general L1-norm low-rank ma-trix approximation. To illustrate this, we use (5) to per-form L1-norm low-rank approximation to two data sets:Giraffe and Face. The two data sets can be downloadedat http://www.robots.ox.ac.uk/~amb/. As illustrated in [17],Giraffe is a 240 × 166 matrix with 12046 missing entries,while Face is a 2596 × 20 matrix with 18218 missing en-tries. The rank of Giraffe is 6, while that of Face is 4. InSect. 3.2 “Structure from motion”, it has been shown thatthe convergence performance of L2-Wiberg seems supe-rior to OPTSPACE [34] and NNLS [35], thus we imposeonly L1-Wiberg, L2-Wiberg and our algorithm (5) on thetwo data sets. The experimental results are listed in Table 3.First, we note that L1-Wiberg and L2-Wiberg fail on the twomatrices due to memory overflow. Second, (5) have gottengood low-rank approximation accuracy for the two matrices.From the experimental results on Giraffe and Face, we cansee the proposed method (5) can be used to perform generallow-rank matrix approximation.

4 Conclusion

To recover shape and rotation matrices from a tracking ma-trix with missing entries, usually we need first find the miss-ing entries and then perform a rank-3 matrix approximation

http://cs.adelaide.edu.au/~anders/code/cvpr2010.html

http://www.robots.ox.ac.uk/~amb/


Table 3 General L1-norm low-rank matrix approximation on both Gi-raffe and Face data sets

Data DS (5) L1-Wiberg [21] L2-Wiberg [15]

Giraffe MAE = 0.25 NA NA

RMS = 0.38 NA NA

Face MAE = 0.06 NA NA

RMS = 0.08 NA NA

Note: Memory overflow fails both L1-Wiberg and L2-Wiberg on the240 × 166 and 2596 × 20 matrices

to extract the two low-rank matrices. Motivated by the in-convenience, in this paper we have proposed a continuous-time dynamic system to directly work out the shape and ro-tation matrices from the tracking matrix. It has been proventhat with time going on the system will converge to a statevector, from which two low-rank matrices and a bias vec-tor are extracted. The product of the two low-rank matricesapproximates, in L1 norm, the given tracking matrix withits each column minus the bias vector. To improve the cal-culating efficiency, a parameter is implanted into the sys-tem. By tuning the parameter, the dynamic system can at-tain both good efficiency and competitive optimizing ca-pability. The proposed approach is also applicable to gen-eral L1-norm low-rank matrix approximation. A large num-ber of experiments on random matrices demonstrate that theL1-norm optimizing capability and calculating efficiency ofthe proposed algorithm are very promising against the L1-Wiberg algorithm; the immediate application into structurefrom motion implies the effectiveness and mild memory re-quirement of the proposed algorithm; and the application ofthe proposed method to general L1-norm low-rank matrixapproximation is also experimentally confirmed on two datasets.

Acknowledgements We thank the Editor and Reviewers for timeand effort going in reviewing this paper. This work was supported byNSFC under Grants 61173182 and 61179071, and the Applied BasicResearch Project (2011JY0124) and the International Cooperation andExchange Project (2012HH0004) of Sichuan Province.

Appendix A: Proof of Proposition 1

Under (4), the vector x(t) changes with time t , and V (x(t))

changes accordingly. To study the variation of V (x(t)) withtime t , the upper right Dini derivative ‘D+’ is used, whichis the generalization of ordinary derivative and is applicableto continuous functions [36]. Based on (4), we have

D+V(

x(t))

1)=m

∑

i=1

n∑

j=1

sgn(

fij

(

x(t))) ∂

∂x(t)

(

fij

(

x(t)))

x(t)

= −m

∑

i=1

n∑

j=1

sgn(

fij

(

x(t)))

gTij

(

x(t))

x(t)

2)= −∥∥xT (t)

∥∥2

≤ 0 (6)

where we have applied Dini derivative in (1) to differ-entiate an absolute function, |g(x)|, that is, D+|g(x)| =sgn(g(x))g(x). In 2) we have inserted (4).

Equation (6) means that if ‖xT (t)‖2 �= 0, D+V (x(t)) < 0will decease V (x(t)). When ‖xT (t)‖2 = 0, x(t) will keep thesame value at time t , and from (6) we can see D+V (x(t)) =0, which implies that V (x(t)) attains a local minimal value(not a local maximal value due to the restriction of (6)).Thus, under the control of (4), if x(t) does not change,V (x(t)) is locally optimized. If ‖xT (t)‖2 does not convergeto zero all the time, we can see V (x(t)) will decrease cease-lessly. As we know, V (x(t)) ≥ 0 is lowly bounded, andV (x(t)) cannot decrease forever. Thus finally ‖xT (t)‖2 willconverge to zero, and accordingly D+V (x(t)) will get to 0in view of (6). So, with the evolution of (6), V (x(t)) will goto a local minimal value.

Appendix B: Proof of Proposition 2

To address the evolution of V (x(t)) under (5), we need tostudy the derivative of V (x(t)) with respect to time t whenx(t) evolves under (5). Like in the proof of Theorem 1, theDini derivative is used.

D+V(

x(t)) = −

m∑

i=1

n∑

j=1

sgn(

fij

(

x(t)))

gTij

(

x(t))

x(t)

1)= −m

∑

i=1

n∑

j=1

sgn(

fij

(

x(t)))

gTij

(

x(t))

×m

∑

k=1

n∑

l=1

tanh(

σfkl

(

x(t)))

gkl

(

x(t))

(7)

where we have inserted (5) in 1). To discuss the influence ofσ , we first study the following relation for σ > 0 and x ∈ R:

sgn(σx) − tanh(σx)

= sgn(σx)(exp(σx) + exp(−σx)) − exp(σx) + exp(−σx)

exp(σx) + exp(−σx)

= sgn(σx)2

exp(2σ |x|) + 1. (8)

Equation (8) indicates that sgn(σx) = tanh(σx) when σ = 0or x = 0, otherwise tanh(σx) become closer and closer tosgn(σx) when σ |x| becomes larger. From (7) we have

D+V(

x(t))

430 Y. Liu et al.

1)= −∥∥∥∥∥

m∑

i=1

n∑

j=1

sgn(

fij

(

x(t)))

gij

(

x(t))

∥∥∥∥∥

2

−m

∑

i=1

n∑

j=1

sgn(

fij

(

x(t)))

gTij

(

x(t))

×m

∑

k=1

n∑

l=1

(

tanh(

σfkl

(

x(t)))

− sgn(

σfkl

(

x(t))))

gkl

(

x(t))

2)≤ −∥∥∥∥∥

m∑

i=1

n∑

j=1

sgn(

fij

(

x(t)))

gij

(

x(t))

∥∥∥∥∥

2

+∣∣∣∣∣

m∑

i=1

n∑

j=1

sgn(

fij

(

x(t)))

gTij

(

x(t))

×m

∑

k=1

n∑

l=1

sgn(

fkl

(

x(t)))

× 2

exp(2σ |fkl(x(t))|) + 1gkl

(

x(t))

∣∣∣∣∣

(9)

where in (1) and (2) we have used sgn(x) = sgn(σx) forσ �= 0 and (8), respectively. In terms of whether fkl(x(t)) =0 holds, there are two cases: (i) fkl(x(t)) = 0, which leaves

sgn(

fkl

(

x(t))) 2

exp(2σ |fkl(x(t))|) + 1gkl(x(t)) = 0;

and (ii) fkl(x(t)) �= 0, which decreases 2exp(2σ |fkl(x(t))|)+1

with the increase of σ . Combining the two cases, wesee that with the increase of σ , the probability, in whichD+V (x(t)) ≤ 0 holds, will increase. Especially, when σ

goes to infinity, D+V (x(t)) ≤ 0 holds definitely as

m∑

k=1

n∑

l=1

sgn(

fkl

(

x(t))) 2

exp(2σ |fkl(x(t))|) + 1gkl

(

x(t))

goes to zero in this case.

Appendix C: Proof of Proposition 3

We use reduction ad absurdum to prove Proposition 3. As-sume it is right that V (x(t)) will keep on increasing, whichmeans

D+V(

x(t))

> 0 for t ≥ 0. (10)

From (3), we know V (x) �∑m

i=1∑n

j=1 |fij (x)|. The con-dition that V (x) keeps on increasing means that there is|fij (x)| (the corresponding ωij �= 0) which will keep on in-creasing, and so are σ |fij (x)|. The condition that σ |fij (x)|keeps on increasing will make a time th available, and at thwe have

∥∥∥∥∥

m∑

i=1

n∑

j=1

sgn(

fij

(

x(th)))

gij

(

x(th))

∥∥∥∥∥

2

>

∣∣∣∣∣

m∑

i=1

n∑

j=1

sgn(

fij

(

x(th)))

gTij

(

x(th))

×m

∑

k=1

n∑

l=1

sgn(

fkl

(

x(th)))

× 2

exp(2σ |fkl(x(th))|) + 1gkl

(

x(th))

∣∣∣∣∣. (11)

In view of (9), (11) means

D+V(

x(th))

< 0. (12)

Both equations, (10) and (11), are all derived from the as-sumption that V (x(t)) keeps on increasing, and the contra-diction between the both equations implies that the assump-tion is not right. Thus, V (x(t)) does not keep on increasing.When σ is large enough, in view of Proposition 2, a decreaseof V (x(t)) occurs more often than increase, and in additionto what we have proved, V (x(t)) cannot keep on increasing;we can say that V (x(t)) falls, on the whole.

References

1. Tomasi, C., Kanade, T.: Shape and motion from image streamsunder orthography: a factorization method. Int. J. Comput. Vis. 9,137–154 (1992)

2. Wang, G., Wu, Q.M.J.: Stratification approach for 3-D Euclideanreconstruction of nonrigid objects from uncalibrated image se-quences. IEEE Trans. Syst. Man Cybern., Part B, Cybern. 38, 90–101 (2008)

3. Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: estimating shape and motion with hierarchical pri-ors. IEEE Trans. Pattern Anal. Mach. Intell. 30, 878–892 (2008)

4. Peng, Y., Ganesh, A., Wright, J., Ma, Y.: RASL: robust alignmentby sparse and low-rank decomposition for linearly correlated im-ages. In: 2010 IEEE Conference on Computer Vision and PatternRecognition (CVPR), pp. 700–763 (2010)

5. Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principalcomponent analysis: exact recovery of corrupted low-rank matri-ces via convex optimization. In: Bengio, Y., Schuurmans, D., Laf-ferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in NeuralInformation Processing Systems, pp. 2080–2088 (2009)

6. Liu, Y., You, Z., Cao, L.: A novel and quick SVM-based multi-class classifier. Pattern Recognit. 39, 2258–2264 (2006)

7. Liu, Y., Sam Ge, S., Li, C., You, Z.: k-NS: a classifier by the dis-tance to the nearest subspace. IEEE Trans. Neural Netw. 22(8),1256–1268 (2011)

8. Morita, T., Kanade, T.: A sequential factorization method for re-covering shape and motion from image streams. IEEE Trans. Pat-tern Anal. Mach. Intell. 19, 858–867 (1997)

9. Hartley, R., Schaffalitzky, F.: Power factorization: 3D reconstruc-tion with missing or uncertain data. In: Australia-Japan AdvancedWorkshop on Computer Vision, pp. 1–9 (2003)

10. Buchanan, A.M., Fitzgibbon, A.W.: Damped newton algorithmsfor matrix factorization with missing data. In: IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR 2005), vol. 2, pp. 316–322 (2005)


11. Morgan, A.B.: Investigation into matrix factorization when ele-ments are unknown. Technical report, Visual Geometry Group,Department of Engineering Science, University of Oxford (2004)

12. Ye, J.: Generalized low rank approximations of matrices. Mach.Learn. 61, 167–191 (2005)

13. Liu, J., Chen, S., Zhou, Z.-H., Tan, X.: Generalized low-rank ap-proximations of matrices revisited. IEEE Trans. Neural Netw. 21,621–632 (2010)

14. Wiberg, T.: Computation of principal components when data aremissing. In: Proc. Second Symp. Computational Statistics, Berlin,pp. 229–236 (1976)

15. Okatani, T., Deguchi, K.: On the Wiberg algorithm for matrix fac-torization in the presence of missing components. Int. J. Comput.Vis. 72, 329–337 (2007)

16. Chen, P.: Heteroscedastic low-rank matrix approximation by theWiberg algorithm. IEEE Trans. Signal Process. 56, 1429–1439(2008)

17. Chen, P.: Optimization algorithms on subspaces: revisiting miss-ing data problem in low-rank matrix. Int. J. Comput. Vis. 80(1),125–142 (2008)

18. Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented Lagrange mul-tiplier method for exact recovery of corrupted low-rank matrices,October (2009)

19. Cai, J.-F., Candés, E.J., Shen, Z.: A singular value thresholdingalgorithm for matrix completion. SIAM J. Optim. 20, 1956–1982(2010)

20. Ke, Q., Kanade, T.: Robust L1 norm factorization in the pres-ence of outliers and missing data by alternative convex program-ming. In: 2005 IEEE Conference on Computer Vision and PatternRecognition (CVPR), vol. 1, pp. 739–746 (2005)

21. Eriksson, A., van den Hengel, A.: Efficient computation of robustlow-rank matrix approximations in the presence of missing datausing the L1 norm. In: 2010 IEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR), pp. 771–778 (2010)

22. Li, S.Z.: Markov Random Field Modeling in Image Analysis.Springer, Berlin (2001)

23. Zucker, S.: Differential geometry from the Frenet point of view:boundary detection, stereo, texture and color. In: Paragios, N.,Chen, Y., Faugeras, O. (eds.) Handbook of Mathematical Modelsin Computer Vision, pp. 359–373. Springer, Berlin (2006)

24. Liu, Y.: Automatic range image registration in the Markov chain.IEEE Trans. Pattern Anal. Mach. Intell. 32, 12–29 (2010)

25. Weickert, J., SteidI, G., Mrazek, P., Welk, M., Brox, T.: Diffusionfilters and wavelets: what can they learn from each other? In: Para-gios, N., Chen, Y., Faugeras, O. (eds.) Handbook of MathematicalModels in Computer Vision, pp. 3–17. Springer, Berlin (2006)

26. Stoykova, E., Alatan, A.A., Benzie, P., Grammalidis, N., Malas-siotis, S., Ostermann, J., Piekh, S., Sainov, V., Theobalt, C., The-var, T., Zabulis, X.: 3-D time-varying scene capture technologiesłasurvey. IEEE Trans. Circuits Syst. Video Technol. 17, 1568–1586(2007)

27. Liu, Y., You, Z., Cao, L.: A functional neural network computingsome eigenvalues and eigenvectors of a special real matrix. NeuralNetw. 18, 1293–1300 (2005)

28. Liu, Y., You, Z., Cao, L.: A concise functional neural networkcomputing the largest modulus eigenvalues and their correspond-ing eigenvectors of a real skew matrix. Theor. Comput. Sci. 367,273–285 (2006)

29. Orban, G.A., Janssen, P., Vogels, R.: Extracting 3D structure fromdisparity. Trends Neurosci. 29(8), 466–473 (2006)

30. la Torre, F.D., Blackt, M.J.: Robust principal component analysisfor computer vision. In: Proceedings of Eighth IEEE InternationalConference on Computer Vision (ICCV 2001), vol. 1, pp. 362–369(2001)

31. El-Melegy, M.T., Al-Ashwal, N.H.: A variational technique for3D reconstruction from multiple views. In: International Confer-ence on Computer Engineering & Systems (ICCES 07), pp. 38–43(2007)

32. Zhong, H., Hung, Y.: Multi-stage 3D reconstruction under circularmotion. Image Vis. Comput. 25, 1814–1823 (2007)

33. Martinec, D., Pajdla, T.: 3D reconstruction by fitting low-rank ma-trices with missing data. In: IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition, CVPR 2005, vol. 1,pp. 198–205 (2005)

34. Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from afew entries. IEEE Trans. Inf. Theory 56, 2980–2998 (2010)

35. Toh, K.-C., Yun, S.: An accelerated proximal gradient algorithmfor nuclear norm regularized linear least squares problems. Pac. J.Optim. 6, 65–640 (2010)

36. Khalil, H.K.: Nonlinear Systems, 3rd edn. Prentice Hall, NewYork (2002)

Yiguang Liu received the M.S.degree in Mechanics in 1998 andPh.D. degree in Computer Applica-tion in 2004 from Peking Univer-sity and Sichuan University, respec-tively. Currently, he is the direc-tor of Vision and Image ProcessingLab, and professor with the Schoolof Computer Science and Engineer-ing, Sichuan University.Prior to joining Sichuan Universityin 2005, he had served as a softwareengineer or director in several com-panies such as Industrial Co., LTDof China South Communication. He

was promoted as a full professor in Sichuan University in 2006. In2008, he worked as a Research Fellow for National University of Sin-gapore, and was chosen into the program for new century excellenttalents of MOE of P. R. China. He has the following research inter-est: computer vision and image processing, pattern recognition, andcomputational intelligence.

Documents

Recovering shape and motion by a dynamic system for low-rank matrix approximation in L1 norm