View
0
Download
0
Category
Preview:
Citation preview
ETNAKent State University and
Johann Radon Institute (RICAM)
Electronic Transactions on Numerical AnalysisVolume 51 pp 469ndash494 2019Copyright ccopy 2019 Kent State UniversityISSN 1068ndash9613DOI 101553etna_vol51s469
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION AND ITSAPPLICATIONS TO SINGULAR VALUE DECOMPOSITIONlowast
YUEHUA FENGdagger JIANWEI XIAODagger AND MING GUDagger
Abstract We present the Flip-Flop Spectrum-Revealing QR (Flip-Flop SRQR) factorization a significantlyfaster and more reliable variant of the QLP factorization of Stewart for low-rank matrix approximations Flip-FlopSRQR uses SRQR factorization to initialize a partial column-pivoted QR factorization and then computes a partialLQ factorization As observed by Stewart in his original QLP work Flip-Flop SRQR tracks the exact singularvalues with ldquoconsiderable fidelityrdquo We develop singular value lower bounds and residual error upper bounds for theFlip-Flop SRQR factorization In situations where singular values of the input matrix decay relatively quickly thelow-rank approximation computed by Flip-Flop SRQR is guaranteed to be as accurate as the truncated SVD Wealso perform a complexity analysis to show that Flip-Flop SRQR is faster than the randomized subspace iteration forapproximating the SVD the standard method used in the Matlab tensor toolbox We additionally compare Flip-FlopSRQR with alternatives on two applications a tensor approximation and a nuclear norm minimization to demonstrateits efficiency and effectiveness
Key words QR factorization randomized algorithm low-rank approximation approximate SVD higher-orderSVD nuclear norm minimization
AMS subject classifications 15A18 15A23 65F99
1 Introduction The singular value decomposition (SVD) of a matrix A isin Rmtimesn isthe factorization of A into the product of three matrices A = UΣV T where the matricesU = (u1 middot middot middot um) isin Rmtimesm and V = (v1 middot middot middot vn) isin Rntimesn are orthogonal singular vectormatrices and Σ isin Rmtimesn is a rectangular diagonal matrix with non-increasing non-negativesingular values σi (1 le i le min (mn)) on the diagonal The SVD has become a criticalanalytic tool in large data analysis and machine learning [1 19 53]
Let Diag (x) denote the diagonal matrix with vector x isin Rn on its diagonal For any1 le k le min (mn) the rank-k truncated SVD of A is defined by
Akdef= (u1 middot middot middot uk) Diag (σ1 middot middot middot σk) (v1 middot middot middot vk)
T
The rank-k truncated SVD turns out to be the best rank-k approximation to A with respect tospectral norm and Frobenius norm as explained by the Eckart-Young-Mirsky Theorem [1826]
However due to the prohibitive costs in computing the rank-k truncated SVD in practicalapplications one typically computes a rank-k approximate SVD which satisfies some tolerancerequirements [16 27 30 42 66] The rank-k approximate SVD has been applied to manyresearch areas including principal component analysis (PCA) [37 58] web search models [38]information retrieval [4 23] and face recognition [52 22]
Among assorted SVD approximation algorithms the pivoted QLP decomposition pro-posed by Stewart [66] is an effective and efficient one The pivoted QLP decomposition isobtained by computing a QR factorization with column pivoting (QRCP) [6 25] of A to get anupper triangular factorR and then computing an LQ factorization ofR to get a lower triangularfactor L Stewartrsquos key numerical observation is that the diagonal elements of L track the
lowastReceived October 14 2018 Accepted November 6 2019 Published online on December 11 2019 Recom-mended by Y Saad The research by M Gu was supported in part by NSF award FRG-1760316daggerSchool of Mathematics Physics and Statistics Shanghai University of Engineering Science China
(fyh1001hotmailcom)DaggerDepartment of Mathematics University of California Berkeley
(jwxiaoberkeleyedu mgumathberkeleyedu)
469
ETNAKent State University and
Johann Radon Institute (RICAM)
470 Y FENG J XIAO AND M GU
singular values of A with ldquoconsiderable fidelityrdquo no matter what the matrix A is The pivotedQLP decomposition is extensively analyzed in Huckaby and Chan [34 35] More recentlyDuersch and Gu developed a much more efficient variant of the pivoted QLP decompositionTUXV and demonstrated its remarkable quality as a low-rank approximation empiricallywithout a rigorous theoretical justification of TUXVrsquos success [17]
In this paper we present Flip-Flop SRQR a slightly different variant of TUXV of Duerschand Gu [17] Like TUXV Flip-Flop SRQR requires the most effort for computing a partial QRfactorization using a truncated randomized QRCP (TRQRCP) and a partial LQ factorizationUnlike TUXV however Flip-Flop SRQR also performs additional computations to ensure aspectrum-revealing QR factorization (SRQR) [75] before the partial LQ factorization
We demonstrate the reliable theoretical quality of this variant as a low-rank approximationand its highly competitiveness with state-of-the-art low-rank approximation methods in realworld applications in both low-rank tensor compression [13 40 60 71] and nuclear normminimization [7 45 49 54 68]
The rest of this paper is organized as follows In Section 2 we introduce the TRQRCPalgorithm the SRQR factorization the low-rank tensor compression and the nuclear normminimization In Section 3 we introduce Flip-Flop SRQR and analyze its computational costsand low-rank approximation properties In Section 4 we present numerical results comparingFlip-Flop SRQR with state-of-the-art low-rank approximation methods In Section 5 wesummarize our main results and draw some conclusions
2 Preliminaries and background
21 Partial QRCP The QR factorization of a matrix A isin Rmtimesn is A = QR with anorthogonal matrix Q isin Rmtimesm and an upper trapezoidal matrix R isin Rmtimesn which can becomputed by the LAPACK [2] routine xGEQRF where x stands for the matrix data type Thestandard QR factorization is not suitable for some practical situations where either the matrixA is rank deficient or only representative columns of A are of interest Usually the QRCPis adequate for the aforementioned situations except a few rare examples such as the Kahanmatrix [26] Given a matrix A isin Rmtimesn the QRCP of matrix A has the form
AΠ = QR
where Π isin Rntimesn is a permutation matrix Q isin Rmtimesm is an orthogonal matrix andR isin Rmtimesn is an upper trapezoidal matrix QRCP can be computed by the LAPACK [2]routines xGEQPF and xGEQP3 where xGEQP3 is a more efficient blocked implementationof xGEQPF For a given target rank k (1 le k le min (mn)) the partial QRCP factorization(Algorithm 4 in the appendix) has a 2times 2 block form
(21) AΠ = Q
[R11 R12
R22
]=[Q1 Q2
] [R11 R12
R22
]
where R11 isin Rktimesk is upper triangular The partial QRCP computes an approximate col-umn subspace of A spanned by the leading k columns in AΠ up to the error term in R22Equivalently (21) yields a low rank approximation
(22) A asymp Q1
[R11 R12
]ΠT
with an approximation quality closely related to the error term in R22The Randomized QRCP (RQRCP) algorithm [17 75] is a more efficient variant of
Algorithm 4 RQRCP generates a Gaussian random matrix Ω isin N (0 1)(b+p)timesm with
b + p m where the entries of Ω are independently sampled from a normal distribution
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 471
Algorithm 1 Truncated Randomized QRCP (TRQRCP)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmGenerate iid Gaussian random matrix Ω isin N (0 1)
(b+p)timesmForm the initial sample matrix B = ΩA and initialize Π = Infor j = 1 b k dob = min (k minus j + 1 b)Do partial QRCP on B ( j n) to obtain b pivotsExchange corresponding columns in A B Π and WT Do QR on A (j m j j + bminus 1) using WY formula without updating the trailingmatrixUpdate B ( j + b n) using the formula (24)
end forQ = Q1Q2 middot middot middotQdkbe R = upper trapezoidal part of the submatrix A (1 k 1 n)
to compress A into B = ΩA with much smaller row dimension In practice b is the blocksize and p is the oversampling size RQRCP repeatedly runs partial QRCP on B to obtain bcolumn pivots applies them to the matrix A and then computes QR without pivoting (QRNP)on A and updates the remaining columns of B RQRCP exits this process when it reachesthe target rank k QRCP and RQRCP choose pivots on A and B respectively RQRCP issignificantly faster than QRCP as B has much smaller row dimension than A It is shownin [75] that RQRCP is as reliable as QRCP up to failure probabilities that decay exponentiallywith respect to the oversampling size p
Since the trailing matrix of A is usually not required for low-rank matrix approximations(see (22)) the TRQRCP (truncated RQRCP) algorithm of [17] re-organizes the computationsin RQRCP to directly compute the approximation (22) without explicitly computing thetrailing matrix R22 For more details both RQRCP and TRQRCP are based on the WYrepresentation of the Householder transformations [5 55 61]
Q = Q1Q2 middot middot middotQk = I minus Y TY T
where T isin Rktimesk is an upper triangular matrix and Y isin Rmtimesk is a trapezoidal matrix
consisting of k consecutive Householder vectors Let WT def= TTY TA then the trailing
matrix update formula becomes QTA = Aminus YWT The main difference between RQRCPand TRQRCP is that RQRCP computes the whole trailing matrix update while TRQRCP onlycomputes the part of the update that affects the approximation (22) More discussions aboutRQRCP and TRQRCP can be found in [17] The main steps of TRQRCP are briefly describedin Algorithm 1 The deduction of the updating formula for B is as follows consider partialQRs on AΠ and BΠ = ΩAΠ
(23) AΠ = Q
[R11 R12
R22
] BΠ = Q
[R11 R12
R22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
472 Y FENG J XIAO AND M GU
Partition Ωdef= Q
TΩQ
def=[Ω1 Ω2
] and therefore (23) implies
Ω
[R11 R12
R22
]=
[R11 R12
R22
]
which leads to the updating formula for B in Algorithm 1
(24)[R12
R22
]larr Ω2R22 =
[R12 minusR11R
minus111 R12
R22
]
With TRQRCP the TUXV algorithm [17 Algorithm 7] (Algorithm 5 in the appendix)computes a low-rank approximation with the QLP factorization at a greatly accelerated speedby computing a partial QR factorization with column pivoting followed with a partial LQfactorization
22 Spectrum-revealing QR factorization Although both RQRCP and TRQRCP arevery effective practical tools for low-rank matrix approximations they are not known toprovide reliable low-rank matrix approximations due to their underlying greediness in thecolumn norm-based pivoting strategy To solve this potential problem of column-basedQR factorization Gu and Eisenstat [28] proposed an efficient way to perform additionalcolumn interchanges to enhance the quality of the leading k columns in AΠ as a basis for theapproximate column subspace More recently a more efficient and effective method SRQRwas introduced and analyzed in [50 75] to compute the low-rank approximation (22) Theconcept of spectrum-revealing introduced in [50 74] emphasizes the utilization of partialQR factorization (22) as a low-rank matrix approximation as opposed to the more traditionalrank-revealing factorization which emphasizes the utility of the partial QR factorization (21)as a tool for numerical rank determination The SRQR algorithm is described in Algorithm 2The SRQR algorithm will always run to completion with a high-quality low-rank matrixapproximation (22) For real data matrices that usually have fast decaying singular-valuespectrum this approximation is often as good as the truncated SVD The SRQR algorithmof [75] explicitly updates the partial QR factorization (21) while swapping columns but theSRQR algorithm can actually avoid any explicit computations on the trailing matrix R22 usingTRQRCP instead of RQRCP to obtain exactly the same partial QR initialization Below weoutline the SRQR algorithm
In (21) let
(25) Rdef=
[R11 a
α
]be the leading (l + 1)times (l + 1) (l ge k) submatrix of R We define
(26) g1def=R2212|α|
and g2def= |α|
∥∥∥RminusT∥∥∥12
where X12 is the largest column 2-norm of X for any given X In [75] the authorsproved approximation quality bounds involving g1 g2 for the low-rank approximation com-puted by RQRCP or TRQRCP RQRCP or TRQRCP will provide a good low-rank ma-
trix approximation if g1 and g2 are O(1) The authors also proved that g1 leradic
1+ε1minusε and
g2 leradic
2(1+ε)
1minusε
(1 +
radic1+ε1minusε
)lminus1
hold with high probability for both RQRCP and TRQRCPwhere 0 lt ε lt 1 is a user-defined parameter which guides the choice of the oversampling size
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473
p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an
extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd
∥∥∥ΩRminusT∥∥∥
12 based on the Johnson-Lindenstrauss Lemma [36]
Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2
l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
end while
23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can
ETNAKent State University and
Johann Radon Institute (RICAM)
474 Y FENG J XIAO AND M GU
be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by
xj1jd 1 le j1 le I1 1 le jd le Id
We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that
yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=
Insumjn=1
xj1jdujjn
Alternatively it can be expressed conveniently in terms of unfolded tensors
Y = X timesn U hArr Y(n) = UX(n)
Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that
X asymp G times1 U1 middot middot middot timesd Ud
where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]
In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding
24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following
minXisinRmtimesn
f (X) + τXlowast
where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the
function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475
Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems
The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)
(27) min Xlowast + λE1 subject to M = X + E
where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]
The matrix completion problem [9 45] can be written in the form
(28) minXisinRmtimesn
Xlowast subject to X + E = M πΩ (E) = 0
where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)
3 Flip-Flop SRQR factorization
31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A
(31) AΠ = QR = Q
[R11 R12
R22
]
where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT
(32) RT =
[RT11
RT12 RT22
]= Q
[R11 R12
R22
]asymp Q1
[R11 R12
]
where Q =[Q1 Q2
]with Q1 isin Rntimesl Therefore combining with the fact that
AΠQ1 = Q[R11 R12
]T
we can approximate the matrix A by
(33) A = QRΠT = Q(RT)T
ΠT asymp Q
[RT11
RT12
]QT1 ΠT = A
(ΠQ1
)(ΠQ1
)T
We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk
then using (33) a rank-k approximate SVD of A is obtained
(34) A asymp UkΣkVTk
where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3
ETNAKent State University and
Johann Radon Institute (RICAM)
476 Y FENG J XIAO AND M GU
Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)
T to obtain Q1 represented by a sequence of Householder vectors
A = AΠQ1[U Σ V ] = svd
(A)
U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)
32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows
1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]
T and forming Q1 is 2nl2minus 23 l
33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd
(A)
is O(ml2)5 The main cost of forming Vk is 2nlk
Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms
On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0
Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1
33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31
LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n
σj (X)2 le σj (X1)
2+ X222 1 le j le min(mn)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
470 Y FENG J XIAO AND M GU
singular values of A with ldquoconsiderable fidelityrdquo no matter what the matrix A is The pivotedQLP decomposition is extensively analyzed in Huckaby and Chan [34 35] More recentlyDuersch and Gu developed a much more efficient variant of the pivoted QLP decompositionTUXV and demonstrated its remarkable quality as a low-rank approximation empiricallywithout a rigorous theoretical justification of TUXVrsquos success [17]
In this paper we present Flip-Flop SRQR a slightly different variant of TUXV of Duerschand Gu [17] Like TUXV Flip-Flop SRQR requires the most effort for computing a partial QRfactorization using a truncated randomized QRCP (TRQRCP) and a partial LQ factorizationUnlike TUXV however Flip-Flop SRQR also performs additional computations to ensure aspectrum-revealing QR factorization (SRQR) [75] before the partial LQ factorization
We demonstrate the reliable theoretical quality of this variant as a low-rank approximationand its highly competitiveness with state-of-the-art low-rank approximation methods in realworld applications in both low-rank tensor compression [13 40 60 71] and nuclear normminimization [7 45 49 54 68]
The rest of this paper is organized as follows In Section 2 we introduce the TRQRCPalgorithm the SRQR factorization the low-rank tensor compression and the nuclear normminimization In Section 3 we introduce Flip-Flop SRQR and analyze its computational costsand low-rank approximation properties In Section 4 we present numerical results comparingFlip-Flop SRQR with state-of-the-art low-rank approximation methods In Section 5 wesummarize our main results and draw some conclusions
2 Preliminaries and background
21 Partial QRCP The QR factorization of a matrix A isin Rmtimesn is A = QR with anorthogonal matrix Q isin Rmtimesm and an upper trapezoidal matrix R isin Rmtimesn which can becomputed by the LAPACK [2] routine xGEQRF where x stands for the matrix data type Thestandard QR factorization is not suitable for some practical situations where either the matrixA is rank deficient or only representative columns of A are of interest Usually the QRCPis adequate for the aforementioned situations except a few rare examples such as the Kahanmatrix [26] Given a matrix A isin Rmtimesn the QRCP of matrix A has the form
AΠ = QR
where Π isin Rntimesn is a permutation matrix Q isin Rmtimesm is an orthogonal matrix andR isin Rmtimesn is an upper trapezoidal matrix QRCP can be computed by the LAPACK [2]routines xGEQPF and xGEQP3 where xGEQP3 is a more efficient blocked implementationof xGEQPF For a given target rank k (1 le k le min (mn)) the partial QRCP factorization(Algorithm 4 in the appendix) has a 2times 2 block form
(21) AΠ = Q
[R11 R12
R22
]=[Q1 Q2
] [R11 R12
R22
]
where R11 isin Rktimesk is upper triangular The partial QRCP computes an approximate col-umn subspace of A spanned by the leading k columns in AΠ up to the error term in R22Equivalently (21) yields a low rank approximation
(22) A asymp Q1
[R11 R12
]ΠT
with an approximation quality closely related to the error term in R22The Randomized QRCP (RQRCP) algorithm [17 75] is a more efficient variant of
Algorithm 4 RQRCP generates a Gaussian random matrix Ω isin N (0 1)(b+p)timesm with
b + p m where the entries of Ω are independently sampled from a normal distribution
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 471
Algorithm 1 Truncated Randomized QRCP (TRQRCP)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmGenerate iid Gaussian random matrix Ω isin N (0 1)
(b+p)timesmForm the initial sample matrix B = ΩA and initialize Π = Infor j = 1 b k dob = min (k minus j + 1 b)Do partial QRCP on B ( j n) to obtain b pivotsExchange corresponding columns in A B Π and WT Do QR on A (j m j j + bminus 1) using WY formula without updating the trailingmatrixUpdate B ( j + b n) using the formula (24)
end forQ = Q1Q2 middot middot middotQdkbe R = upper trapezoidal part of the submatrix A (1 k 1 n)
to compress A into B = ΩA with much smaller row dimension In practice b is the blocksize and p is the oversampling size RQRCP repeatedly runs partial QRCP on B to obtain bcolumn pivots applies them to the matrix A and then computes QR without pivoting (QRNP)on A and updates the remaining columns of B RQRCP exits this process when it reachesthe target rank k QRCP and RQRCP choose pivots on A and B respectively RQRCP issignificantly faster than QRCP as B has much smaller row dimension than A It is shownin [75] that RQRCP is as reliable as QRCP up to failure probabilities that decay exponentiallywith respect to the oversampling size p
Since the trailing matrix of A is usually not required for low-rank matrix approximations(see (22)) the TRQRCP (truncated RQRCP) algorithm of [17] re-organizes the computationsin RQRCP to directly compute the approximation (22) without explicitly computing thetrailing matrix R22 For more details both RQRCP and TRQRCP are based on the WYrepresentation of the Householder transformations [5 55 61]
Q = Q1Q2 middot middot middotQk = I minus Y TY T
where T isin Rktimesk is an upper triangular matrix and Y isin Rmtimesk is a trapezoidal matrix
consisting of k consecutive Householder vectors Let WT def= TTY TA then the trailing
matrix update formula becomes QTA = Aminus YWT The main difference between RQRCPand TRQRCP is that RQRCP computes the whole trailing matrix update while TRQRCP onlycomputes the part of the update that affects the approximation (22) More discussions aboutRQRCP and TRQRCP can be found in [17] The main steps of TRQRCP are briefly describedin Algorithm 1 The deduction of the updating formula for B is as follows consider partialQRs on AΠ and BΠ = ΩAΠ
(23) AΠ = Q
[R11 R12
R22
] BΠ = Q
[R11 R12
R22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
472 Y FENG J XIAO AND M GU
Partition Ωdef= Q
TΩQ
def=[Ω1 Ω2
] and therefore (23) implies
Ω
[R11 R12
R22
]=
[R11 R12
R22
]
which leads to the updating formula for B in Algorithm 1
(24)[R12
R22
]larr Ω2R22 =
[R12 minusR11R
minus111 R12
R22
]
With TRQRCP the TUXV algorithm [17 Algorithm 7] (Algorithm 5 in the appendix)computes a low-rank approximation with the QLP factorization at a greatly accelerated speedby computing a partial QR factorization with column pivoting followed with a partial LQfactorization
22 Spectrum-revealing QR factorization Although both RQRCP and TRQRCP arevery effective practical tools for low-rank matrix approximations they are not known toprovide reliable low-rank matrix approximations due to their underlying greediness in thecolumn norm-based pivoting strategy To solve this potential problem of column-basedQR factorization Gu and Eisenstat [28] proposed an efficient way to perform additionalcolumn interchanges to enhance the quality of the leading k columns in AΠ as a basis for theapproximate column subspace More recently a more efficient and effective method SRQRwas introduced and analyzed in [50 75] to compute the low-rank approximation (22) Theconcept of spectrum-revealing introduced in [50 74] emphasizes the utilization of partialQR factorization (22) as a low-rank matrix approximation as opposed to the more traditionalrank-revealing factorization which emphasizes the utility of the partial QR factorization (21)as a tool for numerical rank determination The SRQR algorithm is described in Algorithm 2The SRQR algorithm will always run to completion with a high-quality low-rank matrixapproximation (22) For real data matrices that usually have fast decaying singular-valuespectrum this approximation is often as good as the truncated SVD The SRQR algorithmof [75] explicitly updates the partial QR factorization (21) while swapping columns but theSRQR algorithm can actually avoid any explicit computations on the trailing matrix R22 usingTRQRCP instead of RQRCP to obtain exactly the same partial QR initialization Below weoutline the SRQR algorithm
In (21) let
(25) Rdef=
[R11 a
α
]be the leading (l + 1)times (l + 1) (l ge k) submatrix of R We define
(26) g1def=R2212|α|
and g2def= |α|
∥∥∥RminusT∥∥∥12
where X12 is the largest column 2-norm of X for any given X In [75] the authorsproved approximation quality bounds involving g1 g2 for the low-rank approximation com-puted by RQRCP or TRQRCP RQRCP or TRQRCP will provide a good low-rank ma-
trix approximation if g1 and g2 are O(1) The authors also proved that g1 leradic
1+ε1minusε and
g2 leradic
2(1+ε)
1minusε
(1 +
radic1+ε1minusε
)lminus1
hold with high probability for both RQRCP and TRQRCPwhere 0 lt ε lt 1 is a user-defined parameter which guides the choice of the oversampling size
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473
p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an
extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd
∥∥∥ΩRminusT∥∥∥
12 based on the Johnson-Lindenstrauss Lemma [36]
Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2
l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
end while
23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can
ETNAKent State University and
Johann Radon Institute (RICAM)
474 Y FENG J XIAO AND M GU
be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by
xj1jd 1 le j1 le I1 1 le jd le Id
We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that
yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=
Insumjn=1
xj1jdujjn
Alternatively it can be expressed conveniently in terms of unfolded tensors
Y = X timesn U hArr Y(n) = UX(n)
Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that
X asymp G times1 U1 middot middot middot timesd Ud
where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]
In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding
24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following
minXisinRmtimesn
f (X) + τXlowast
where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the
function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475
Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems
The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)
(27) min Xlowast + λE1 subject to M = X + E
where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]
The matrix completion problem [9 45] can be written in the form
(28) minXisinRmtimesn
Xlowast subject to X + E = M πΩ (E) = 0
where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)
3 Flip-Flop SRQR factorization
31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A
(31) AΠ = QR = Q
[R11 R12
R22
]
where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT
(32) RT =
[RT11
RT12 RT22
]= Q
[R11 R12
R22
]asymp Q1
[R11 R12
]
where Q =[Q1 Q2
]with Q1 isin Rntimesl Therefore combining with the fact that
AΠQ1 = Q[R11 R12
]T
we can approximate the matrix A by
(33) A = QRΠT = Q(RT)T
ΠT asymp Q
[RT11
RT12
]QT1 ΠT = A
(ΠQ1
)(ΠQ1
)T
We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk
then using (33) a rank-k approximate SVD of A is obtained
(34) A asymp UkΣkVTk
where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3
ETNAKent State University and
Johann Radon Institute (RICAM)
476 Y FENG J XIAO AND M GU
Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)
T to obtain Q1 represented by a sequence of Householder vectors
A = AΠQ1[U Σ V ] = svd
(A)
U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)
32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows
1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]
T and forming Q1 is 2nl2minus 23 l
33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd
(A)
is O(ml2)5 The main cost of forming Vk is 2nlk
Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms
On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0
Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1
33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31
LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n
σj (X)2 le σj (X1)
2+ X222 1 le j le min(mn)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 471
Algorithm 1 Truncated Randomized QRCP (TRQRCP)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmGenerate iid Gaussian random matrix Ω isin N (0 1)
(b+p)timesmForm the initial sample matrix B = ΩA and initialize Π = Infor j = 1 b k dob = min (k minus j + 1 b)Do partial QRCP on B ( j n) to obtain b pivotsExchange corresponding columns in A B Π and WT Do QR on A (j m j j + bminus 1) using WY formula without updating the trailingmatrixUpdate B ( j + b n) using the formula (24)
end forQ = Q1Q2 middot middot middotQdkbe R = upper trapezoidal part of the submatrix A (1 k 1 n)
to compress A into B = ΩA with much smaller row dimension In practice b is the blocksize and p is the oversampling size RQRCP repeatedly runs partial QRCP on B to obtain bcolumn pivots applies them to the matrix A and then computes QR without pivoting (QRNP)on A and updates the remaining columns of B RQRCP exits this process when it reachesthe target rank k QRCP and RQRCP choose pivots on A and B respectively RQRCP issignificantly faster than QRCP as B has much smaller row dimension than A It is shownin [75] that RQRCP is as reliable as QRCP up to failure probabilities that decay exponentiallywith respect to the oversampling size p
Since the trailing matrix of A is usually not required for low-rank matrix approximations(see (22)) the TRQRCP (truncated RQRCP) algorithm of [17] re-organizes the computationsin RQRCP to directly compute the approximation (22) without explicitly computing thetrailing matrix R22 For more details both RQRCP and TRQRCP are based on the WYrepresentation of the Householder transformations [5 55 61]
Q = Q1Q2 middot middot middotQk = I minus Y TY T
where T isin Rktimesk is an upper triangular matrix and Y isin Rmtimesk is a trapezoidal matrix
consisting of k consecutive Householder vectors Let WT def= TTY TA then the trailing
matrix update formula becomes QTA = Aminus YWT The main difference between RQRCPand TRQRCP is that RQRCP computes the whole trailing matrix update while TRQRCP onlycomputes the part of the update that affects the approximation (22) More discussions aboutRQRCP and TRQRCP can be found in [17] The main steps of TRQRCP are briefly describedin Algorithm 1 The deduction of the updating formula for B is as follows consider partialQRs on AΠ and BΠ = ΩAΠ
(23) AΠ = Q
[R11 R12
R22
] BΠ = Q
[R11 R12
R22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
472 Y FENG J XIAO AND M GU
Partition Ωdef= Q
TΩQ
def=[Ω1 Ω2
] and therefore (23) implies
Ω
[R11 R12
R22
]=
[R11 R12
R22
]
which leads to the updating formula for B in Algorithm 1
(24)[R12
R22
]larr Ω2R22 =
[R12 minusR11R
minus111 R12
R22
]
With TRQRCP the TUXV algorithm [17 Algorithm 7] (Algorithm 5 in the appendix)computes a low-rank approximation with the QLP factorization at a greatly accelerated speedby computing a partial QR factorization with column pivoting followed with a partial LQfactorization
22 Spectrum-revealing QR factorization Although both RQRCP and TRQRCP arevery effective practical tools for low-rank matrix approximations they are not known toprovide reliable low-rank matrix approximations due to their underlying greediness in thecolumn norm-based pivoting strategy To solve this potential problem of column-basedQR factorization Gu and Eisenstat [28] proposed an efficient way to perform additionalcolumn interchanges to enhance the quality of the leading k columns in AΠ as a basis for theapproximate column subspace More recently a more efficient and effective method SRQRwas introduced and analyzed in [50 75] to compute the low-rank approximation (22) Theconcept of spectrum-revealing introduced in [50 74] emphasizes the utilization of partialQR factorization (22) as a low-rank matrix approximation as opposed to the more traditionalrank-revealing factorization which emphasizes the utility of the partial QR factorization (21)as a tool for numerical rank determination The SRQR algorithm is described in Algorithm 2The SRQR algorithm will always run to completion with a high-quality low-rank matrixapproximation (22) For real data matrices that usually have fast decaying singular-valuespectrum this approximation is often as good as the truncated SVD The SRQR algorithmof [75] explicitly updates the partial QR factorization (21) while swapping columns but theSRQR algorithm can actually avoid any explicit computations on the trailing matrix R22 usingTRQRCP instead of RQRCP to obtain exactly the same partial QR initialization Below weoutline the SRQR algorithm
In (21) let
(25) Rdef=
[R11 a
α
]be the leading (l + 1)times (l + 1) (l ge k) submatrix of R We define
(26) g1def=R2212|α|
and g2def= |α|
∥∥∥RminusT∥∥∥12
where X12 is the largest column 2-norm of X for any given X In [75] the authorsproved approximation quality bounds involving g1 g2 for the low-rank approximation com-puted by RQRCP or TRQRCP RQRCP or TRQRCP will provide a good low-rank ma-
trix approximation if g1 and g2 are O(1) The authors also proved that g1 leradic
1+ε1minusε and
g2 leradic
2(1+ε)
1minusε
(1 +
radic1+ε1minusε
)lminus1
hold with high probability for both RQRCP and TRQRCPwhere 0 lt ε lt 1 is a user-defined parameter which guides the choice of the oversampling size
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473
p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an
extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd
∥∥∥ΩRminusT∥∥∥
12 based on the Johnson-Lindenstrauss Lemma [36]
Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2
l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
end while
23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can
ETNAKent State University and
Johann Radon Institute (RICAM)
474 Y FENG J XIAO AND M GU
be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by
xj1jd 1 le j1 le I1 1 le jd le Id
We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that
yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=
Insumjn=1
xj1jdujjn
Alternatively it can be expressed conveniently in terms of unfolded tensors
Y = X timesn U hArr Y(n) = UX(n)
Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that
X asymp G times1 U1 middot middot middot timesd Ud
where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]
In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding
24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following
minXisinRmtimesn
f (X) + τXlowast
where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the
function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475
Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems
The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)
(27) min Xlowast + λE1 subject to M = X + E
where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]
The matrix completion problem [9 45] can be written in the form
(28) minXisinRmtimesn
Xlowast subject to X + E = M πΩ (E) = 0
where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)
3 Flip-Flop SRQR factorization
31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A
(31) AΠ = QR = Q
[R11 R12
R22
]
where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT
(32) RT =
[RT11
RT12 RT22
]= Q
[R11 R12
R22
]asymp Q1
[R11 R12
]
where Q =[Q1 Q2
]with Q1 isin Rntimesl Therefore combining with the fact that
AΠQ1 = Q[R11 R12
]T
we can approximate the matrix A by
(33) A = QRΠT = Q(RT)T
ΠT asymp Q
[RT11
RT12
]QT1 ΠT = A
(ΠQ1
)(ΠQ1
)T
We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk
then using (33) a rank-k approximate SVD of A is obtained
(34) A asymp UkΣkVTk
where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3
ETNAKent State University and
Johann Radon Institute (RICAM)
476 Y FENG J XIAO AND M GU
Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)
T to obtain Q1 represented by a sequence of Householder vectors
A = AΠQ1[U Σ V ] = svd
(A)
U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)
32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows
1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]
T and forming Q1 is 2nl2minus 23 l
33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd
(A)
is O(ml2)5 The main cost of forming Vk is 2nlk
Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms
On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0
Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1
33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31
LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n
σj (X)2 le σj (X1)
2+ X222 1 le j le min(mn)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
472 Y FENG J XIAO AND M GU
Partition Ωdef= Q
TΩQ
def=[Ω1 Ω2
] and therefore (23) implies
Ω
[R11 R12
R22
]=
[R11 R12
R22
]
which leads to the updating formula for B in Algorithm 1
(24)[R12
R22
]larr Ω2R22 =
[R12 minusR11R
minus111 R12
R22
]
With TRQRCP the TUXV algorithm [17 Algorithm 7] (Algorithm 5 in the appendix)computes a low-rank approximation with the QLP factorization at a greatly accelerated speedby computing a partial QR factorization with column pivoting followed with a partial LQfactorization
22 Spectrum-revealing QR factorization Although both RQRCP and TRQRCP arevery effective practical tools for low-rank matrix approximations they are not known toprovide reliable low-rank matrix approximations due to their underlying greediness in thecolumn norm-based pivoting strategy To solve this potential problem of column-basedQR factorization Gu and Eisenstat [28] proposed an efficient way to perform additionalcolumn interchanges to enhance the quality of the leading k columns in AΠ as a basis for theapproximate column subspace More recently a more efficient and effective method SRQRwas introduced and analyzed in [50 75] to compute the low-rank approximation (22) Theconcept of spectrum-revealing introduced in [50 74] emphasizes the utilization of partialQR factorization (22) as a low-rank matrix approximation as opposed to the more traditionalrank-revealing factorization which emphasizes the utility of the partial QR factorization (21)as a tool for numerical rank determination The SRQR algorithm is described in Algorithm 2The SRQR algorithm will always run to completion with a high-quality low-rank matrixapproximation (22) For real data matrices that usually have fast decaying singular-valuespectrum this approximation is often as good as the truncated SVD The SRQR algorithmof [75] explicitly updates the partial QR factorization (21) while swapping columns but theSRQR algorithm can actually avoid any explicit computations on the trailing matrix R22 usingTRQRCP instead of RQRCP to obtain exactly the same partial QR initialization Below weoutline the SRQR algorithm
In (21) let
(25) Rdef=
[R11 a
α
]be the leading (l + 1)times (l + 1) (l ge k) submatrix of R We define
(26) g1def=R2212|α|
and g2def= |α|
∥∥∥RminusT∥∥∥12
where X12 is the largest column 2-norm of X for any given X In [75] the authorsproved approximation quality bounds involving g1 g2 for the low-rank approximation com-puted by RQRCP or TRQRCP RQRCP or TRQRCP will provide a good low-rank ma-
trix approximation if g1 and g2 are O(1) The authors also proved that g1 leradic
1+ε1minusε and
g2 leradic
2(1+ε)
1minusε
(1 +
radic1+ε1minusε
)lminus1
hold with high probability for both RQRCP and TRQRCPwhere 0 lt ε lt 1 is a user-defined parameter which guides the choice of the oversampling size
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473
p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an
extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd
∥∥∥ΩRminusT∥∥∥
12 based on the Johnson-Lindenstrauss Lemma [36]
Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2
l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
end while
23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can
ETNAKent State University and
Johann Radon Institute (RICAM)
474 Y FENG J XIAO AND M GU
be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by
xj1jd 1 le j1 le I1 1 le jd le Id
We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that
yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=
Insumjn=1
xj1jdujjn
Alternatively it can be expressed conveniently in terms of unfolded tensors
Y = X timesn U hArr Y(n) = UX(n)
Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that
X asymp G times1 U1 middot middot middot timesd Ud
where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]
In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding
24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following
minXisinRmtimesn
f (X) + τXlowast
where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the
function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475
Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems
The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)
(27) min Xlowast + λE1 subject to M = X + E
where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]
The matrix completion problem [9 45] can be written in the form
(28) minXisinRmtimesn
Xlowast subject to X + E = M πΩ (E) = 0
where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)
3 Flip-Flop SRQR factorization
31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A
(31) AΠ = QR = Q
[R11 R12
R22
]
where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT
(32) RT =
[RT11
RT12 RT22
]= Q
[R11 R12
R22
]asymp Q1
[R11 R12
]
where Q =[Q1 Q2
]with Q1 isin Rntimesl Therefore combining with the fact that
AΠQ1 = Q[R11 R12
]T
we can approximate the matrix A by
(33) A = QRΠT = Q(RT)T
ΠT asymp Q
[RT11
RT12
]QT1 ΠT = A
(ΠQ1
)(ΠQ1
)T
We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk
then using (33) a rank-k approximate SVD of A is obtained
(34) A asymp UkΣkVTk
where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3
ETNAKent State University and
Johann Radon Institute (RICAM)
476 Y FENG J XIAO AND M GU
Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)
T to obtain Q1 represented by a sequence of Householder vectors
A = AΠQ1[U Σ V ] = svd
(A)
U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)
32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows
1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]
T and forming Q1 is 2nl2minus 23 l
33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd
(A)
is O(ml2)5 The main cost of forming Vk is 2nlk
Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms
On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0
Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1
33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31
LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n
σj (X)2 le σj (X1)
2+ X222 1 le j le min(mn)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 473
p For reasonably choices of ε like ε = 12 g1 is a small constant while g2 can potentially be an
extremely large number which can lead to a poor low-rank approximation quality To avoidthe potential large number of g2 the SRQR algorithm (Algorithm 2) proposed in [75] uses apair-wise swapping strategy to guarantee that g2 is below some user-defined tolerance g gt 1which is usually chosen to be a small number larger than one like 20 In Algorithm 2 weuse an approximate formula instead of using the definition (26) directly to estimate g2 ieg2 asymp |α|radicd
∥∥∥ΩRminusT∥∥∥
12 based on the Johnson-Lindenstrauss Lemma [36]
Algorithm 2 Spectrum-revealing QR Factorization (SRQR)InputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsOrthogonal matrix Q isin Rmtimesm formed by the first k reflectorsUpper trapezoidal matrix R isin RktimesnPermutation matrix Π isin Rntimesn such that AΠ asymp Q ( 1 k)RAlgorithmCompute QRΠ with RQRCP or TRQRCP to l stepsCompute squared 2-norm of the columns of B( l + 1 n) ri (l + 1 le i le n) where Bis an updated random projection of A computed by RQRCP or TRQRCPApproximate squared 2-norm of the columns of A(l + 1 m l + 1 n) ri = ri(b +p) (l + 1 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1) (d l)Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
while g2 gt g doı = argmax1leilel+1ith column norm of ΩRminusT Swap ı-th and (l + 1)-st columns of A and Π in a Round Robin rotationGivens-rotate R back into upper-trapezoidal formrl+1 = R2
l+1l+1 ri = ri +A(l + 1 i)2 (l + 2 le i le n)ı = argmaxl+1leilenriSwap ı-th and (l + 1)-st columns of AΠ rOne-step QR factorization of A(l + 1 m l + 1 n)|α| = Rl+1l+1ri = ri minusA(l + 1 i)2 (l + 2 le i le n)Generate a random matrix Ω isin N (0 1)dtimes(l+1)(d l) Compute g2 = |α|
∥∥∥RminusT∥∥∥12asymp |α|radic
d
∥∥∥ΩRminusT∥∥∥
12
end while
23 Tensor approximation In this section we review some basic notations and conceptsinvolving tensors A more detailed discussion of the properties and applications of tensors can
ETNAKent State University and
Johann Radon Institute (RICAM)
474 Y FENG J XIAO AND M GU
be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by
xj1jd 1 le j1 le I1 1 le jd le Id
We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that
yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=
Insumjn=1
xj1jdujjn
Alternatively it can be expressed conveniently in terms of unfolded tensors
Y = X timesn U hArr Y(n) = UX(n)
Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that
X asymp G times1 U1 middot middot middot timesd Ud
where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]
In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding
24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following
minXisinRmtimesn
f (X) + τXlowast
where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the
function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475
Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems
The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)
(27) min Xlowast + λE1 subject to M = X + E
where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]
The matrix completion problem [9 45] can be written in the form
(28) minXisinRmtimesn
Xlowast subject to X + E = M πΩ (E) = 0
where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)
3 Flip-Flop SRQR factorization
31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A
(31) AΠ = QR = Q
[R11 R12
R22
]
where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT
(32) RT =
[RT11
RT12 RT22
]= Q
[R11 R12
R22
]asymp Q1
[R11 R12
]
where Q =[Q1 Q2
]with Q1 isin Rntimesl Therefore combining with the fact that
AΠQ1 = Q[R11 R12
]T
we can approximate the matrix A by
(33) A = QRΠT = Q(RT)T
ΠT asymp Q
[RT11
RT12
]QT1 ΠT = A
(ΠQ1
)(ΠQ1
)T
We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk
then using (33) a rank-k approximate SVD of A is obtained
(34) A asymp UkΣkVTk
where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3
ETNAKent State University and
Johann Radon Institute (RICAM)
476 Y FENG J XIAO AND M GU
Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)
T to obtain Q1 represented by a sequence of Householder vectors
A = AΠQ1[U Σ V ] = svd
(A)
U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)
32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows
1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]
T and forming Q1 is 2nl2minus 23 l
33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd
(A)
is O(ml2)5 The main cost of forming Vk is 2nlk
Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms
On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0
Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1
33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31
LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n
σj (X)2 le σj (X1)
2+ X222 1 le j le min(mn)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
474 Y FENG J XIAO AND M GU
be found in the review [40] A tensor is a d-dimensional array of numbers denoted by scriptnotation X isin RI1timesmiddotmiddotmiddottimesId with entries given by
xj1jd 1 le j1 le I1 1 le jd le Id
We use the matrix X(n) isin RIntimes(Πj 6=nIj) to denote the nth mode unfolding of the tensorX Since this tensor has d dimensions there are altogether d-possibilities for unfolding Then-mode product of a tensor X isin RI1timesmiddotmiddotmiddottimesId with a matrix U isin RktimesIn results in a tensorY isin RI1timesmiddotmiddotmiddottimesInminus1timesktimesIn+1timesmiddotmiddotmiddottimesId such that
yj1jnminus1jjn+1jd = (X timesn U)j1jnminus1jjn+1jd=
Insumjn=1
xj1jdujjn
Alternatively it can be expressed conveniently in terms of unfolded tensors
Y = X timesn U hArr Y(n) = UX(n)
Decompositions of higher-order tensors have applications in signal processing [12 15 63]numerical linear algebra [13 39 76] computer vision [62 72 67] etc Two particular tensordecompositions can be considered as higher-order extensions of the matrix SVD CANDE-COMPPARAFAC (CP) [10 31] decomposes a tensor as a sum of rank-one tensors andthe Tucker decomposition [70] is a higher-order form of the principal component analy-sis Knowing the definitions of mode products and unfolding of tensors we can introducethe sequentially truncated higher-order SVD (ST-HOSVD) algorithm for producing a rank(k1 kd) approximation to the tensor based on the Tucker decomposition format The ST-HOSVD algorithm [3 71] (Algorithm 6 in the appendix) returns a core tensor G isin Rk1timesmiddotmiddotmiddottimeskdand a set of unitary matrices Uj isin RIjtimeskj for j = 1 d such that
X asymp G times1 U1 middot middot middot timesd Ud
where the right-hand side is called a Tucker decomposition However a straightforwardgeneralization to higher-order (d ge 3) tensors of the matrix Eckart-Young-Mirsky Theorem isnot possible [14]
In ST-HOSVD one key step is to compute an exact or approximate rank-kr SVD ofthe tensor unfolding Well known efficient ways to compute an exact low-rank SVD includeKrylov subspace methods [44] There are also efficient randomized algorithms to find anapproximate low-rank SVD [30] In the Matlab tensorlab toolbox [73] the most efficientmethod MLSVD_RSI is essentially ST-HOSVD with an randomized subspace iteration tofind the approximate SVD of the tensor unfolding
24 Nuclear norm minimization Matrix rank minimization problem appears ubiqui-tously in many fields such as Euclidean embedding [20 46] control [21 51 56] collaborativefiltering [9 57 65] system identification [47 48] etc The regularized nuclear norm mini-mization problem is the following
minXisinRmtimesn
f (X) + τXlowast
where τ gt 0 is a regularization parameter and Xlowastdef=sumqi=1 σi (X) The choice of the
function f (middot) is situational f (X) = M minus X1 in robust principal component analysis(robust PCA) [8] and f (X) = πΩ (M)minus πΩ (X) ||2F in matrix completion [7]
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475
Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems
The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)
(27) min Xlowast + λE1 subject to M = X + E
where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]
The matrix completion problem [9 45] can be written in the form
(28) minXisinRmtimesn
Xlowast subject to X + E = M πΩ (E) = 0
where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)
3 Flip-Flop SRQR factorization
31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A
(31) AΠ = QR = Q
[R11 R12
R22
]
where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT
(32) RT =
[RT11
RT12 RT22
]= Q
[R11 R12
R22
]asymp Q1
[R11 R12
]
where Q =[Q1 Q2
]with Q1 isin Rntimesl Therefore combining with the fact that
AΠQ1 = Q[R11 R12
]T
we can approximate the matrix A by
(33) A = QRΠT = Q(RT)T
ΠT asymp Q
[RT11
RT12
]QT1 ΠT = A
(ΠQ1
)(ΠQ1
)T
We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk
then using (33) a rank-k approximate SVD of A is obtained
(34) A asymp UkΣkVTk
where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3
ETNAKent State University and
Johann Radon Institute (RICAM)
476 Y FENG J XIAO AND M GU
Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)
T to obtain Q1 represented by a sequence of Householder vectors
A = AΠQ1[U Σ V ] = svd
(A)
U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)
32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows
1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]
T and forming Q1 is 2nl2minus 23 l
33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd
(A)
is O(ml2)5 The main cost of forming Vk is 2nlk
Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms
On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0
Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1
33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31
LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n
σj (X)2 le σj (X1)
2+ X222 1 le j le min(mn)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 475
Many researchers have devoted themselves to solve the above nuclear norm minimizationproblem and plenty of algorithms have been proposed including singular value thresholding(SVT) [7] fixed point continuous (FPC) [49] accelerated proximal gradient (APG) [68]augmented Lagrange multiplier (ALM) [45] The most expensive part of these algorithms liesin the computation of the truncated SVD Inexact augmented Lagrange multiplier (IALM) [45]has been proved to be one of the most accurate and efficient among them We now describeIALM for the robust PCA and matrix completion problems
The robust PCA problem can be formalized as a minimization problem of a sum of nuclearnorm and a scaled matrix l1-norm (sum of matrix entries in absolute value)
(27) min Xlowast + λE1 subject to M = X + E
where M is a measured matrix X has low-rank E is an error matrix and sufficiently sparseand λ is a positive weighting parameter Algorithm 7 in the appendix describes details of theIALM method to solve the robust PCA problem [45]
The matrix completion problem [9 45] can be written in the form
(28) minXisinRmtimesn
Xlowast subject to X + E = M πΩ (E) = 0
where πΩ Rmtimesn rarr Rmtimesn is an orthogonal projection that keeps the entries in Ω unchangedand sets those outside Ω zeros In [45] the authors applied the IALM method to the matrixcompletion problem (Algorithm 8 in the appendix)
3 Flip-Flop SRQR factorization
31 Flip-Flop SRQR factorization In this section we introduce our Flip-Flop SRQRfactorization a slightly different variant of TUXV (Algorithm 5 in the appendix) to computean SVD approximation based on the QLP factorization Given an integer l ge k we run SRQR(the version without computing the trailing matrix) with l steps on A
(31) AΠ = QR = Q
[R11 R12
R22
]
where R11 isin Rltimesl is upper triangular R12 isin Rltimes(nminusl) and R22 isin R(mminusl)times(nminusl) Then werun partial QRNP with l steps on RT
(32) RT =
[RT11
RT12 RT22
]= Q
[R11 R12
R22
]asymp Q1
[R11 R12
]
where Q =[Q1 Q2
]with Q1 isin Rntimesl Therefore combining with the fact that
AΠQ1 = Q[R11 R12
]T
we can approximate the matrix A by
(33) A = QRΠT = Q(RT)T
ΠT asymp Q
[RT11
RT12
]QT1 ΠT = A
(ΠQ1
)(ΠQ1
)T
We denote the rank-k truncated SVD ofAΠQ1 by UkΣkVTk LetUk = Uk Vk = ΠQ1Vk
then using (33) a rank-k approximate SVD of A is obtained
(34) A asymp UkΣkVTk
where Uk isin Rmtimesk Vk isin Rntimesk are column orthonormal and Σk = Diag (σ1 middot middot middot σk) withσirsquos being the leading k singular values of AΠQ1 The Flip-Flop SRQR factorization isoutlined in Algorithm 3
ETNAKent State University and
Johann Radon Institute (RICAM)
476 Y FENG J XIAO AND M GU
Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)
T to obtain Q1 represented by a sequence of Householder vectors
A = AΠQ1[U Σ V ] = svd
(A)
U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)
32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows
1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]
T and forming Q1 is 2nl2minus 23 l
33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd
(A)
is O(ml2)5 The main cost of forming Vk is 2nlk
Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms
On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0
Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1
33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31
LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n
σj (X)2 le σj (X1)
2+ X222 1 le j le min(mn)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
476 Y FENG J XIAO AND M GU
Algorithm 3 Flip-Flop Spectrum-Revealing QR FactorizationInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0Integer l ge k Tolerance g gt 1 for g2OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmRun SRQR on A to l steps to obtain (R11 R12)Run QRNP on (R11 R12)
T to obtain Q1 represented by a sequence of Householder vectors
A = AΠQ1[U Σ V ] = svd
(A)
U = U ( 1 k) Σ = Σ (1 k 1 k) V = ΠQ1V ( 1 k)
32 Complexity analysis In this section we perform a complexity analysis of Flip-Flop SRQR Since the approximate SVD only makes sense when the target rank k is smallwe assume that k le l min (mn) The complexity analysis of Flip-Flop SRQR looks asfollows
1 The main cost of doing SRQR with TRQRCP onA is 2mnl+2(b+p)mn+(m+n)l22 The main cost of the QR factorization for [R11 R12]
T and forming Q1 is 2nl2minus 23 l
33 The main cost of computing A = AΠQ1 is 2mnl4 The main cost of computing [Usimsim] = svd
(A)
is O(ml2)5 The main cost of forming Vk is 2nlk
Since k le l min (mn) the complexity of Flip-Flop SRQR is 4mnl + 2(b+ p)mnby omitting the lower-order terms
On the other hand the complexity of the approximate SVD with randomized subspaceiteration (RSISVD) [27 30] is (4 + 4q)mn (k + p) where p is the oversampling size and qis the number of subspace iterations (see the detailed analysis in the appendix) In practice pis chosen to be a small integer for instance 5 in RSISVD In Flip-Flop SRQR l is usuallychosen to be a little bit larger than k eg l = k + 5 We also found that l = k is sufficient interms of the approximation quality in our numerical experiments Therefore we observe thatFlip-Flop SRQR is more efficient than RSISVD for any q gt 0
Since a practical dataset matrix can be too large to fit into the main memory the cost oftransferring the matrix into a memory hierarchy typically predominates the arithmetic costTherefore we also compare the number of data passes between RSISVD and Flip-Flop SRQRRSISVD needs 2q + 2 passes while Flip-Flop SRQR needs 4 passes computing B = ΩAswapping the columns of A realizing a partial QR factorization on A and computing AΠQ1Therefore Flip-Flop SRQR requires less data transfer than RSISVD for any q gt 1
33 Quality analysis of Flip-Flop SRQR This section is devoted to the quality analysisof Flip-Flop SRQR We start with Lemma 31
LEMMA 31 Given any matrix X = [X1X2] with Xi isin Rmtimesni i = 1 2 and n1 +n2 = n
σj (X)2 le σj (X1)
2+ X222 1 le j le min(mn)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 477
Proof Since XXT = X1XT1 + X2X
T2 we obtain the above result using [33 Theo-
rem 3316]
We are now ready to derive bounds for the singular values and the approximation errorof Flip-Flop SRQR We need to emphasize that even if the target rank is k we run Flip-FlopSRQR with an actual target rank l which is a little bit larger than k The difference betweenk and l can create a gap between the singular values of A so that we can obtain a reliablelow-rank approximation
THEOREM 32 Given a matrix A isin Rmtimesn a target rank k an oversampling size p andan actual target rank l ge k the matrices Uk Σk Vk in (34) computed by Flip-Flop SRQRsatisfy
(35) σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
1 le j le k
and
(36)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2
(R222σk+1 (A)
)4
where R22 isin R(mminusl)times(nminusl) is the trailing matrix in (31) Using the properties of SRQR wefurthermore have
(37) σj (Σk) ge σj (A)
4
radic1 + min
(2τ4 τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4) 1 le j le k
and
(38)∥∥Aminus UkΣkV
Tk
∥∥2le σk+1 (A)
4
radic1 + 2τ4
(σl+1 (A)
σk+1 (A)
)4
where τ and τ defined in (312) have matrix dimension-dependent upper bounds
τ le g1g2
radic(l + 1) (nminus l) and τ le g1g2
radicl (nminus l)
where g1 leradic
1+ε1minusε and g2 le g with high probability The parameters ε gt 0 and g gt 1 are
user-defined
Proof In terms of the singular value bounds observing that
[R11 R12
R22
][R11 R12
R22
]T=
[R11R
T11 + R12R
T12 R12R
T22
R22RT12 R22R
T22
]
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
478 Y FENG J XIAO AND M GU
we apply Lemma 31 twice for any 1 le j le k
σ2j
[R11 R12
R22
][R11 R12
R22
]Tle σ2
j
([R11R
T11 + R12R
T12 R12R
T22
])+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+∥∥∥R12R
T22
∥∥∥2
2+∥∥∥[R22R
T12 R22R
T22
]∥∥∥2
2
le σ2j
(R11R
T11 + R12R
T12
)+ 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ2j
(R11R
T11 + R12R
T12
)+ 2 R2242 (39)
The relation (39) can be further rewritten as
σ4j (A) le σ4
j
([R11 R12
])+ 2 R2242 = σ4
j (Σk) + 2 R2242 1 le j le k
which is equivalent to
σj(Σk) ge σj(A)
4
radic1 +
2R2242σ4j (Σk)
For the residual matrix bound we let[R11 R12
]def=[R11 R12
]+[δR11 δR12
]
where[R11 R12
]is the rank-k truncated SVD of
[R11 R12
] Since
(310)∥∥Aminus UkΣkV
Tk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
it follows from the orthogonality of singular vectors that[R11 R12
]T [δR11 δR12
]= 0
and therefore[R11 R12
]T [R11 R12
]=[R11 R12
]T [R11 R12
]+[δR11 δR12
]T [δR11 δR12
]
which implies
(311) RT12R12 = RT
12R12 +(δR12
)T (δR12
)
Similarly to the deduction of (39) from (311) we can derive∥∥∥∥[δR11 δR12
R22
]∥∥∥∥4
2
le∥∥[δR11 δR12
]∥∥4
2+ 2
∥∥∥∥[δR12
R22
]∥∥∥∥4
2
le σ4k+1 (A) + 2
∥∥∥∥∥[R12
R22
]∥∥∥∥∥4
2
= σ4k+1 (A) + 2 R2242
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 479
Combining with (310) it now follows that
∥∥Aminus UkΣkVTk
∥∥2
=
∥∥∥∥∥AΠminusQ[R11 R12
0
]TQT
∥∥∥∥∥2
=
∥∥∥∥[δR11 δR12
R22
]∥∥∥∥2
le σk+1 (A)4
radic1 + 2
(R222σk+1 (A)
)4
To obtain an upper bound of R222 in (35) and (36) we follow the analysis of SRQRin [75]
From the analysis of [75 Section IV] Algorithm 2 ensures that g1 leradic
1+ε1minusε and g2 le g
with high probability (the actual probability-guarantee formula can be found in [75 SectionIV]) where g1 and g2 are defined by (26) Here 0 lt ε lt 1 is a user defined parameter toadjust the choice of the oversampling size p used in the TRQRCP initialization part in SRQRg gt 1 is a user-defined parameter in the extra swapping part in SRQR Let
(312) τdef= g1g2
R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1 (A)and τ
def= g1g2
R222R2212
∥∥RminusT11
∥∥minus1
12
σk (Σk)
where R is defined in equation (25) Since 1radicnX12 le X2 le
radicnX12 and
σi (X1) le σi (X) 1 le i le min (s t) for any matrix X isin Rmtimesn and submatrix X1 isin Rstimestof X we obtain that
τ = g1g2R222R2212
∥∥∥RminusT∥∥∥minus1
12
σl+1
(R) σl+1
(R)
σl+1 (A)le g1g2
radic(l + 1) (nminus l)
Using the fact that σl([R11 R12
])= σl
(R11
)and σk (Σk) = σk
([R11 R12
])by (32)
and (34)
τ = g1g2R222R2212
∥∥RminusT11
∥∥minus1
12
σl (R11)
σl (R11)
σl([R11 R12
]) σl ([R11 R12
])σk (Σk)
le g1g2
radicl (nminus l)
By the definition of τ
(313) R222 = τ σl+1 (A)
Plugging this into (36) yields (38)By the definition of τ we observe that
(314) R222 le τσk (Σk)
By (35) and (314)(315)
σj (Σk) ge σj (A)
4
radic1 + 2
(R222σj(Σk)
)4ge σj (A)
4
radic1 + 2
(R222σk(Σk)
)4ge σj (A)
4radic
1 + 2τ4 1 le j le k
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
480 Y FENG J XIAO AND M GU
On the other hand using (35)
σ4j (A) le σ4
j (Σk)
(1 + 2
σ4j (A)
σ4j (Σk)
R2242σ4j (A)
)
le σ4j (Σk)
1 + 2
(σ4j (Σk) + 2 R2242
)σ4j (Σk)
R2242σ4j (A)
le σ4
j (Σk)
(1 + 2
(1 + 2
R2242σ4k (Σk)
)R2242σ4j (A)
)
that is
σj (Σk) ge σj (A)
4
radic1 +
(2 + 4
R2242σ4k(Σk)
)R2242σ4j (A)
Plugging (313) and (314) into this above equation yields
(316) σj (Σk) ge σj (A)
4
radic1 + τ4 (2 + 4τ4)
(σl+1(A)σj(A)
)4 1 le j le k
Combining (315) with (316) we arrive at (37)We note that (35) and (36) still hold true if we replace k by lInequality (37) shows that with the definitions (312) of τ and τ Flip-Flop SRQR can
reveal at least a dimension-dependent fraction of all the leading singular values of A andindeed approximate them very accurately in case they decay relatively quickly Moreover (38)shows that Flip-Flop SRQR can compute a rank-k approximation that is up to a factor of4
radic1 + 2τ4
(σl+1(A)σk+1(A)
)4
from optimal In situations where the singular values of A decay
relatively quickly our rank-k approximation is about as accurate as the truncated SVD with achoice of l such that
σl+1 (A)
σk+1 (A)= o (1)
REMARK 33 The ldquohigh probabilityrdquo mentioned in Theorem 32 is dependent on theoversampling size p but not on the block size b We decided not to dive deep into the probabilityexplanation since it will make the proof much longer and more tedious Theoretically thecomputed oversampling size p for certain given high probability would be rather high Forexample to achieve a probability 095 we need to choose p to be at least 500 However inpractice we would never choose such a large p since it will deteriorate the code efficiencyIt turns out that a small p like 5 sim 10 would achieve good approximation results empiricallyThe detailed explanation about what does high probability mean here can be found in [75]In terms of the choice of the block size b it wonrsquot affect the analysis or the error boundsThe choice of b is purely for the performance of the codes Only a suitable b can take fulladvantage of the cache hierarchy and BLAS routines
4 Numerical experiments In this section we demonstrate the effectiveness and effi-ciency of the Flip-Flop SRQR (FFSRQR) algorithm in several numerical experiments Firstly
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 481
TABLE 41Methods for approximate SVD
Method DescriptionLANSVD Approximate SVD using Lanczos bidiagonalization with partial
reorthogonalization [42]Its Fortran and Matlab implementations are from PROPACK [41]
FFSRQR Flip-Flop Spectrum-revealing QR factorizationIts Matlab implementation is using Mex-wrappedBLAS and LAPACK routines [2]
RSISVD Approximate SVD with randomized subspace iteration [30 27]Its Matlab implementation is from tensorlab toolbox [73]
TUXV Algorithm 5 in the appendixLTSVD Linear Time SVD [16]
We use its Matlab implementation by Ma et al [49]
we compare FFSRQR with other approximate SVD algorithms for a matrix approximationSecondly we compare FFSRQR with other methods on tensor approximation problems usingthe tensorlab toolbox [73] Thirdly we compare FFSRQR with other methods for the robustPCA problem and matrix completion problem All experiments are implemented on a laptopwith 29 GHz Intel Core i5 processor and 8 GB of RAM The codes based on LAPACK [2]and PROPACK [41] in Section 41 are in Fortran The codes in Sections 42 43 are in Matlabexcept that FFSRQR is implemented in Fortran and wrapped by mex
41 Approximate truncated SVD In this section we compare FFSRQR with otherapproximate SVD algorithms for low-rank matrix approximation All tested methods are listedin Table 41 Since LTSVD has only a Matlab implementation this method is only consideredin Sections 42 and 43 The test matrices areType 1 A isin Rmtimesn [66] is defined by A = U DV T + 01 middot dss middotE where U isin Rmtimess V isin
Rntimess are column-orthonormal matrices and D isin Rstimess is a diagonal matrix with sgeometrically decreasing diagonal entries from 1 to 10minus3 dss is the last diagonalentry of D E isin Rmtimesn is a random matrix where the entries are independentlysampled from a normal distribution N (0 1) In our numerical experiment we testthree different random matrices The square matrix has a size of 10000times 10000 theshort-fat matrix has a size of 1000times 10000 and the tall-skinny matrix has a size of10000times 1000
Type 2 A isin R4929times4929 is a real data matrix from the University of Florida sparse matrixcollection [11] Its corresponding file name is HBGEMAT11
For a given matrix A isin Rmtimesn the relative SVD approximation error is measured by∥∥Aminus UkΣkVTk
∥∥FAF where Σk contains the approximate top k singular values and
Uk Vk are the corresponding approximate top k singular vectors The parameters used inFFSRQR RSISVD TUXV and LTSVD are listed in Table 42 The additional time requiredto compute g2 is negligible In our experiments g2 always remains modest and never triggerssubsequent SRQR column swaps However computing g2 is still recommended since g2
serves as an insurance policy against potential quality degration by TRQRCP The block size bis chosen to be 32 according to the xGEQP3 LAPACK routine When k is less than 32 weimplement an unblock version of the partial QRCP
Figures 41ndash43 display the run time the relative approximation error and the top 20singular values comparison respectively for four different matrices In terms of speed FFSRQRis always faster than LANSVD and RSISVD Moreover the efficiency difference is more
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
482 Y FENG J XIAO AND M GU
TABLE 42Parameters used in RSISVD FFSRQR TUXV and LTSVD
Method ParameterFFSRQR oversampling size p = 5 block size b = min32 k l = k d = 10 g = 20RSISVD oversampling size p = 5 subspace iteration q = 1TUXV oversampling size p = 5 block size b = min32 k l = kLTSVD probabilities pi = 1n
obvious when k is relatively large FFSRQR is only slightly slower than TUXV In termsof the relative approximation error FFSRQR is comparable to the other three methods Interms of the top singular values approximation Figure 43 displays the top 20 singular valueswhen k = 500 for four different matrices In Figure 43 FFSRQR is superior to TUXV aswe take the absolute values of the main diagonal entries of the upper triangle matrix R as theapproximate singular values for TUXV
100 200 300 400 500
target rank k
100
101
102
103
104
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-1
100
101
102
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
10-1
100
101
102
103
run tim
e (
sec)
FFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 41 Run time comparison for approximate SVD algorithms
42 Tensor approximation This section illustrates the effectiveness and efficiency ofFFSRQR for computing approximate tensors ST-HOSVD [3 71] is one of the most efficient
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 483
100 200 300 400 500
target rank k
022
024
026
028
03
032
034
036re
lative
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
100 200 300 400 500
target rank k
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
100 200 300 400 500
target rank k
024
026
028
03
032
034
036
rela
tive
ap
pro
xim
atio
n e
rro
rFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 42 Relative approximation error comparison for approximate SVD algorithms
algorithms to compute a Tucker decomposition of tensors and the most costly part of thisalgorithm is to compute an SVD or approximate SVD of the tensor unfoldings The truncatedSVD and RSISVD are used in the routines MLSVD and MLSVD_RSI respectively in theMatlab tensorlab toolbox [73] Based on this Matlab toolbox we implement ST-HOSVDusing FFSRQR or LTSVD to do the SVD approximation We name these two new routines byMLSVD_FFSRQR and MLSVD_LTSVD respectively and compare these four routines in thenumerical experiment We also have Python codes for this tensor approximation experimentWe donrsquot list the results of the python programs here as they are similar to those for Matlab
421 A sparse tensor example We test a sparse tensor X isin Rntimesntimesn of the followingformat [64 59]
X =
10sumj=1
1000
jxj yj zj +
nsumj=11
1
jxj yj zj
where xj yj zj isin Rn are sparse vectors with nonnegative entries The symbol ldquordquo representsthe vector outer product We compute a rank-(k k k) Tucker decomposition [GU1 U2 U3]using MLSVD MLSVD_FFSRQR MLSVD_RSI and MLSVD_LTSVD respectively Therelative approximation error is measured by X minus XkF XF where Xk = G times1 U1 times2
U2 times3 U3
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
484 Y FENG J XIAO AND M GU
5 10 15 20
index
06
07
08
09
1to
p 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(a) Type 1 Random square matrix
5 10 15 20
index
06
07
08
09
1
top 2
0 s
ingula
r valu
es
FFSRQR
RSISVD
LANSVD
TUXV
(b) Type 1 Random short-fat matrix
5 10 15 20
index
06
065
07
075
08
085
09
095
top
20
sin
gu
lar
va
lue
s
FFSRQR
RSISVD
LANSVD
TUXV
(c) Type 1 Random tall-skinny matrix
5 10 15 20
index
101
102
103
top
20
sin
gu
lar
va
lue
sFFSRQR
RSISVD
LANSVD
TUXV
(d) Type 2 GEMAT11
FIG 43 Top 20 singular values comparison for approximate SVD algorithms
Figure 44 displays a comparison of the efficiency and accuracy of different methods for a400times 400times 400 sparse tensor approximation problem MLSVD_LTSVD is the fastest but theleast accurate one The other three methods have similar accuracy while MLSVD_FFSRQR isfaster when the target rank k is larger
422 Handwritten digits classification MNIST is a handwritten digits image data setcreated by Yann LeCun [43] Every digit is represented by a 28times28 pixel image Handwrittendigits classification has the objective of training a classification model to classify new unlabeledimages A HOSVD algorithm is proposed by Savas and Elden [60] to classify handwrittendigits To reduce the training time a more efficient ST-HOSVD algorithm is introducedin [71]
We do handwritten digits classification using MNIST which consists of 60 000 trainingimages and 10 000 test images The number of training images in each class is restrictedto 5421 so that the training set is equally distributed over all classes The training set isrepresented by a tensor X of size 786times 5421times 10 The classification relies on Algorithm 2in [60] We use various algorithms to obtain an approximation X asymp G times1 U1 times2 U2 times3 U3
where the core tensor G has size 65times 142times 10The results are summarized in Table 43 In terms of run time we observe that our method
MLSVD_FFSRQR is comparable to MLSVD_RSI while MLSVD is the most expensive
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 485
50 100 150 200
target rank k
10-5
10-4
10-3
10-2
10-1
100
rela
tive
ap
pro
xim
atio
n e
rro
r
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
50 100 150 200
target rank k
10-2
10-1
100
101
102
run tim
e (
sec)
MLSVD
MLSVD_FFSRQR
MLSVD_RSI
MLSVD_LTSVD
FIG 44 Run time and relative approximation error comparison on a sparse tensor
TABLE 43Comparison on handwritten digits classification
MLSVD MLSVD_FFSRQR MLSVD_RSI MLSVD_LTSVD
Training Time [sec] 272121 15455 19343 04266Relative Model Error 04099 04273 04247 05162Classification Accur 9519 9498 9505 9259
one and MLSVD_LTSVD is the fastest one In terms of classification quality MLSVDMLSVD_FFSRQR and MLSVD_RSI are comparable while MLSVD_LTSVD is the leastaccurate one
43 Solving nuclear norm minimization problem To show the effectiveness of theFFSRQR algorithm for the nuclear norm minimization problems we investigate two scenariosthe robust PCA (27) and the matrix completion problem (28) The test matrix used for therobust PCA is introduced in [45] and the test matrices used in the matrix completion are tworeal data sets We use the IALM method [45] to solve both problems the IALM code can bedownloaded1
431 Robust PCA To solve the robust PCA problem we replace the approximateSVD part in the IALM method [45] by various methods We denote the actual solution tothe robust PCA problem by a matrix pair (Xlowast Elowast) isin Rmtimesn times Rmtimesn The matrix Xlowast isXlowast = XLX
TR with XL isin Rmtimesk XR isin Rntimesk being random matrices whose entries are
independently sampled from a normal distribution The sparse matrix Elowast is a random matrixwhere its non-zero entries are independently sampled from a uniform distribution over theinterval [minus500 500] The input to the IALM algorithm has the form M = Xlowast + Elowast andthe output is denoted by
(X E
) In this numerical experiment we use the same parameter
settings as the IALM code for robust PCA rank k is 01m and the number of non-zero entriesin E is 005m2 We choose the trade-off parameter λ = 1
radicmax (mn) as suggested by
Candes et al [8] The solution quality is measured by the normalized root mean square errorX minusXlowastF XlowastF
Table 44 includes relative error run time the number of non-zero entries in E (E0)the iteration count and the number of non-zero singular values (sv) in X of the IALM
1httpperceptioncslillinoisedumatrix-ranksample_codehtml
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
486 Y FENG J XIAO AND M GU
algorithm using different approximate SVD methods We observe that IALM_FFSRQR isfaster than all the other three methods while its error is comparable to IALM_LANSVD andIALM_RSISVD IALM_LTSVD is relatively slow and not effective
TABLE 44Comparison on robust PCA
Size Method Error Time (sec) E0 Iter sv
1000times 1000
IALM_LANSVD 333e-07 579e+00 50000 22 100IALM_FFSRQR 279e-07 102e+00 50000 25 100IALM_RSISVD 336e-07 109e+00 49999 22 100IALM_LTSVD 992e-02 311e+00 999715 100 100
2000times 2000
IALM_LANSVD 261e-07 591e+01 199999 22 200IALM_FFSRQR 182e-07 693e+00 199998 25 200IALM_RSISVD 263e-07 738e+00 199996 22 200IALM_LTSVD 842e-02 220e+01 3998937 100 200
4000times 4000
IALM_LANSVD 138e-07 465e+02 799991 23 400IALM_FFSRQR 139e-07 443e+01 800006 26 400IALM_RSISVD 151e-07 504e+01 799990 23 400IALM_LTSVD 894e-02 154e+02 15996623 100 400
6000times 6000
IALM_LANSVD 130e-07 166e+03 1799982 23 600IALM_FFSRQR 102e-07 142e+02 1799993 26 600IALM_RSISVD 144e-07 162e+02 1799985 23 600IALM_LTSVD 858e-02 555e+02 35992605 100 600
432 Matrix completion We solve the matrix completion problems for two real datasets used in [68] the Jester joke data set [24] and the MovieLens data set [32] The Jester jokedata set consists of 41 million ratings for 100 jokes from 73 421 users and can be downloadedfrom the website httpgoldbergberkeleyedujester-data We test withthe following data matrices
bull jester-1 Data from 24 983 users who have rated 36 or more jokesbull jester-2 Data from 23 500 users who have rated 36 or more jokesbull jester-3 Data from 24 938 users who have rated between 15 and 35 jokesbull jester-all The combination of jester-1 jester-2 and jester-3
The MovieLens data set can be downloaded fromhttpsgrouplensorgdatasetsmovielensWe test with the following data matricesbull movie-100K 100 000 ratings of 943 users for 1682 moviesbull movie-1M 1 million ratings of 6040 users for 3900 moviesbull movie-latest-small 100 000 ratings of 700 users for 9000 movies
For each data set we let M be the original data matrix where Mij stands for the ratingof joke (movie) j by user i and Γ be the set of indices where Mij is known The matrixcompletion algorithm quality is measured by the Normalized Mean Absolute Error (NMAE)defined by
NMAEdef=
1|Γ|sum
(ij)isinΓ |Mij minusXij |rmax minus rmin
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 487
where Xij is the prediction of the rating of joke (movie) j given by user i and rmin rmax arethe lower and upper bounds of the ratings respectively For the Jester joke data sets we setrmin = minus10 and rmax = 10 For the MovieLens data sets we set rmin = 1 and rmax = 5
Since |Γ| is large we randomly select a subset Ω from Γ and then use Algorithm 8 tosolve the problem (28) We randomly select 10 ratings for each user in the Jester joke datasets while we randomly choose about 50 of the ratings for each user in the MovieLensdata sets Table 45 includes the parameter settings in the algorithms The maximum iterationnumber is 100 in IALM and all other parameters are the same as those used in [45]
The numerical results are included in Table 46 We observe that IALM_FFSRQR achievesalmost the same recoverability as other methods (except IALM_LTSVD) and is slightly fasterthan IALM_RSISVD for these two data sets
TABLE 45Parameters used in the IALM method on matrix completion
Data set m n |Γ| |Ω|jester-1 24983 100 181e+06 249830jester-2 23500 100 171e+06 235000jester-3 24938 100 617e+05 249384
jester-all 73421 100 414e+06 734210movie-100K 943 1682 100e+05 49918movie-1M 6040 3706 100e+06 498742
movie-latest-small 671 9066 100e+05 52551
5 Conclusions We have presented the Flip-Flop SRQR factorization a variant ofthe QLP factorization to compute low-rank matrix approximations The Flip-Flop SRQRalgorithm uses a SRQR factorization to initialize the truncated version of a column-pivotedQR factorization and then forms an LQ factorization For the numerical results presented theerrors in the proposed algorithm were comparable to those obtained from the other state-of-the-art algorithms This new algorithm is cheaper to compute and produces quality low-rankmatrix approximations Furthermore we prove singular value lower bounds and residual errorupper bounds for the Flip-Flop SRQR factorization In situations where singular values ofthe input matrix decay relatively quickly the low-rank approximation computed by Flip-FlopSRQR is guaranteed to be as accurate as the truncated SVD We also perform a complexityanalysis to show that Flip-Flop SRQR is faster than the approximate SVD with randomizedsubspace iteration Future work includes reducing the overhead cost in Flip-Flop SRQR andimplementing the Flip-Flop SRQR algorithm on distributed memory machines for popularapplications such as distributed PCA
Acknowledgments The authors would like to thank Yousef Saad the editor and threeanonymous referees who kindly reviewed the earlier version of this manuscript and providedvaluable suggestions and comments
Appendix A Partial QR factorization with column pivotingThe details of partial QRCP are covered in Algorithm 4
Appendix B TUXV algorithmThe details of TUXV algorithm are presented in Algorithm 5
Appendix C Computing Tucker decomposition of higher-order tensorsAlgorithm 6 computes the Tucker decomposition of higher-order tensors ie sequentially
truncated higher-order SVD
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
488 Y FENG J XIAO AND M GU
TABLE 46Comparison on matrix completion
Data set Method Iter Time NMAE sv σmax σmin
jester-1
IALM-LANSVD 12 706e+00 184e-01 100 214e+03 100e+00IALM-FFSRQR 12 344e+00 169e-01 100 228e+03 100e+00IALM-RSISVD 12 375e+00 189e-01 100 212e+03 100e+00IALM-LTSVD 100 211e+01 174e-01 62 300e+03 100e+00
jester-2
IALM-LANSVD 12 680e+00 185e-01 100 213e+03 100e+00IALM-FFSRQR 12 279e+00 170e-01 100 229e+03 100e+00IALM-RSISVD 12 359e+00 191e-01 100 212e+03 100e+00IALM-LTSVD 100 203e+01 175e-01 58 296e+03 100e+00
jester-3
IALM-LANSVD 12 705e+00 126e-01 99 179e+03 100e+00IALM-FFSRQR 12 303e+00 122e-01 100 171e+03 100e+00IALM-RSISVD 12 385e+00 131e-01 100 178e+03 100e+00IALM-LTSVD 100 212e+01 133e-01 55 250e+03 100e+00
jester-all
IALM-LANSVD 12 239e+01 172e-01 100 356e+03 100e+00IALM-FFSRQR 12 112e+01 162e-01 100 363e+03 100e+00IALM-RSISVD 12 134e+01 182e-01 100 347e+03 100e+00IALM-LTSVD 100 699e+01 168e-01 52 492e+03 100e+00
movie-100K
IALM-LANSVD 29 286e+01 183e-01 285 121e+03 100e+00IALM-FFSRQR 30 455e+00 167e-01 295 153e+03 100e+00IALM-RSISVD 29 482e+00 182e-01 285 129e+03 100e+00IALM-LTSVD 48 142e+01 147e-01 475 191e+03 100e+00
movie-1M
IALM-LANSVD 50 740e+02 158e-01 495 499e+03 100e+00IALM-FFSRQR 53 207e+02 137e-01 525 663e+03 100e+00IALM-RSISVD 50 223e+02 157e-01 495 535e+03 100e+00IALM-LTSVD 100 850e+02 117e-01 995 897e+03 100e+00
movie-latest-small
IALM-LANSVD 31 166e+02 185e-01 305 113e+03 100e+00IALM-FFSRQR 31 196e+01 200e-01 305 142e+03 100e+00IALM-RSISVD 31 285e+01 191e-01 305 120e+03 100e+00IALM-LTSVD 63 402e+01 208e-01 298 179e+03 100e+00
Appendix D IALM for solving two special nuclear norm minimization problemsAlgorithm 7 and Algorithm 8 solve the robust PCA and matrix completion problems
respectively In Algorithm 7 Sω (x) = sgn (x) middot max (|x| minus ω 0) is the soft shrinkageoperator [29] with x isin Rn and ω gt 0 In Algorithm 8 Ω is the complement of Ω
Appendix E Approximate SVD with randomized subspace iterationThe randomized subspace iteration was proposed in [30 Algorithm 44] to compute an
orthonormal matrix whose range approximates the range of A An approximate SVD can becomputed using the aforementioned orthonormal matrix [30 Algorithm 51] A randomizedsubspace iteration is used in routine MLSVD_RSI in Matlab toolbox tensorlab [73] andMLSVD_RSI is by far the most efficient function that we can find in Matlab to compute ST-HOSVD We summarize the approximate SVD with randomized subspace iteration pseudocodein Algorithm 9
Now we perform a complexity analysis for the approximate SVD with randomizedsubspace iteration We first note that
1 The cost of generating a random matrix is negligible2 The cost of computing B = AΩ is 2mn (k + p)
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 489
Algorithm 4 Partial QRCPInputsMatrix A isin Rmtimesn Target rank kOutputsOrthogonal matrix Q isin RmtimesmUpper trapezoidal matrix R isin RmtimesnPermutation matrix Π isin Rntimesn such that AΠ = QRAlgorithmInitialize Π = In Compute column norms rs = A (1 m s) 2 (1 le s le n)for j = 1 k do
Find i = arg maxjleslen
rs Exchange rj and ri columns in A and Π
Form Householder reflection Qj from A (j m j)Update trailing matrix A (j m j n)larr QTj A (j m j n)Update rs = A (j + 1 m s) 2 (j + 1 le s le n)
end forQ = Q1Q2 middot middot middotQk is the product of all reflections R = upper trapezoidal part of A
Algorithm 5 TUXV AlgorithmInputsMatrix A isin Rmtimesn Target rank k Block size b Oversampling size p ge 0OutputsColumn orthonormal matrices U isin Rmtimesk V isin Rntimesk and upper triangular matrixR isin Rktimesk such that A asymp URV T AlgorithmDo TRQRCP on A to obtain Q isin Rmtimesk R isin Rktimesn and Π isin RntimesnR = RΠT and do LQ factorization ie [VR] = qr(RT 0)Compute Z = AV and do QR factorization ie [UR] = qr(Z 0)
3 In each QR step [Qsim] = qr (B 0) the cost of computing the QR factorization ofB is 2m (k + p)
2 minus 23 (k + p)
3 (cf [69]) and the cost of forming the first (k + p)
columns in the full Q matrix is m (k + p)2
+ 13 (k + p)
3
Now we count the flops for each i in the for loop
1 The cost of computing B = AT lowastQ is 2mn (k + p)2 The cost of computing [Qsim] = qr (B 0) is 2n (k + p)
2 minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is n (k + p)
2+ 1
3 (k + p)3
3 The cost of computing B = A lowastQ is 2mn (k + p)4 The cost of computing [Qsim] = qr (B 0) is 2m (k + p)
2minus 23 (k + p)
3 and the costof forming the first (k + p) columns in the full Q matrix is m (k + p)
2+ 1
3 (k + p)3
Together the cost of running the for loop q times is
q
(4mn (k + p) + 3 (m+ n) (k + p)
2 minus 2
3(k + p)
3
)
Additionally the cost of computing B = QT lowastA is 2mn (k + p) the cost of doing SVD of Bis O(n (k + p)
2) and the cost of computing U = Q lowast U is 2m (k + p)
2
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
490 Y FENG J XIAO AND M GU
Algorithm 6 ST-HOSVDInputsTensor X isin RI1timesmiddotmiddotmiddottimesId desired rank (k1 kd) and processing order p = (p1 middot middot middot pd)OutputsTucker decomposition [GU1 middot middot middot Ud]AlgorithmDefine tensor G larr X for j = 1 d dor = pj Compute exact or approximate rank kr SVD of the tensor unfolding G(r) asymp UrΣrV Tr Ur larr UrUpdate G(r) larr ΣrV
Tr ie applying UTr to G
end for
Algorithm 7 Robust PCA Using IALMInputsMeasured matrix M isin Rmtimesn positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 J (M) = max
(M2 λminus1MF
) Y0 = MJ (M) E0 = 0
while not converged do(UΣV) = svd
(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = Sλmicrominus1k
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Now assume k + p min (mn) and omit the lower-order terms then we arriveat (4q + 4)mn (k + p) as the complexity of approximate SVD with randomized subspaceiteration In practice q is usually chosen to be an integer between 0 and 2
REFERENCES
[1] O ALTER P O BROWN AND D BOTSTEIN Singular value decomposition for genome-wide expressiondata processing and modeling Proc Natl Acad Sci USA 97 (2000) pp 10101ndash10106
[2] E ANDERSON Z BAI C BISCHOF L S BLACKFORD J DEMMEL J DONGARRA J DU CROZA GREENBAUM S HAMMARLING A MCKENNEY AND D SORENSEN LAPACK Usersrsquo Guide 3rded SIAM Philadelphia 1999
[3] C A ANDERSSON AND R BRO Improving the speed of multi-way algorithms Part I Tucker3 ChemometricsIntell Lab Syst 42 (1998) pp 93ndash103
[4] M W BERRY S T DUMAIS AND G W OrsquoBRIEN Using linear algebra for intelligent information retrievalSIAM Rev 37 (1995) pp 573ndash595
[5] C BISCHOF AND C VAN LOAN The WY representation for products of Householder matrices SIAM J Sci
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 491
Algorithm 8 Matrix Completion Using IALMInputsSampled set Ω sampled entries πΩ (M) positive number λ micro0 micro tolerance tol ρ gt 1OutputsMatrix pair (Xk Ek)Algorithmk = 0 Y0 = 0 E0 = 0while not converged do
(UΣV) = svd(MminusEk + microminus1k Yk
)
Xk+1 = USmicrominus1k
(Σ)V T
Ek+1 = πΩ
(M minusXk+1 + microminus1
k Yk)
Yk+1 = Yk + microk (M minusXk+1 minus Ek+1)Update microk+1 = min (ρmicrok micro)k = k + 1if M minusXk minus EkF MF lt tol then
Breakend if
end while
Statist Comput 8 (1987) pp S2ndashS13[6] P BUSINGER AND G H GOLUB Handbook series linear algebra Linear least squares solutions by House-
holder transformations Numer Math 7 (1965) pp 269ndash276[7] J-F CAI E J CANDEgraveS AND Z SHEN A singular value thresholding algorithm for matrix completion
SIAM J Optim 20 (2010) pp 1956ndash1982[8] E J CANDEgraveS X LI Y MA AND J WRIGHT Robust principal component analysis J ACM 58 (2011)
Art 11 37 pages[9] E J CANDEgraveS AND B RECHT Exact matrix completion via convex optimization Found Comput Math 9
(2009) pp 717ndash772[10] J D CARROLL AND J-J CHANG Analysis of individual differences in multidimensional scaling via an
N-way generalization of ldquoEckart-Youngrdquo decomposition Psychometrika 35 (1970) pp 283ndash319[11] T A DAVIS AND Y HU The University of Florida sparse matrix collection ACM Trans Math Software 38
(2011) pp Art 1 25[12] L DE LATHAUWER AND B DE MOOR From matrix to tensor Multilinear algebra and signal processing in
Mathematics in Signal Processing IV Inst Math Appl Conf Ser Oxford Univ Press Oxford 1998pp 1ndash16
[13] L DE LATHAUWER B DE MOOR AND J VANDEWALLE A multilinear singular value decompositionSIAM J Matrix Anal Appl 21 (2000) pp 1253ndash1278
[14] On the best rank-1 and rank-(R1 R2 middot middot middot RN ) approximation of higher-order tensors SIAM JMatrix Anal Appl 21 (2000) pp 1324ndash1342
[15] L DE LATHAUWER AND J VANDEWALLE Dimensionality reduction in higher-order signal processing andrank-(R1 R2 RN ) reduction in multilinear algebra Linear Algebra Appl 391 (2004) pp 31ndash55
[16] P DRINEAS R KANNAN AND M W MAHONEY Fast Monte Carlo algorithms for matrices II Computinga low-rank approximation to a matrix SIAM J Comput 36 (2006) pp 158ndash183
[17] J A DUERSCH AND M GU Randomized QR with column pivoting SIAM J Sci Comput 39 (2017)pp C263ndashC291
[18] C ECKART AND G YOUNG The approximation of one matrix by another of lower rank Psychometrika 1(1936) pp 211ndash218
[19] L ELDEacuteN Matrix Methods in Data Mining and Pattern Recognition SIAM Philadelphia 2007[20] M FAZEL H HINDI AND S P BOYD Log-det heuristic for matrix rank minimization with applications to
Hankel and Euclidean distance matrices in Proceedings of the 2003 American Control Conference 2003IEEE Conference Proceedings IEEE Piscataway pp 2156ndash2162
[21] M FAZEL T K PONG D SUN AND P TSENG Hankel matrix rank minimization with applications tosystem identification and realization SIAM J Matrix Anal Appl 34 (2013) pp 946ndash977
[22] C FRANK AND E NOumlTH Optimizing eigenfaces by face masks for facial expression recognition in ComputerAnalysis of Images and Patterns N Petkov and M A Westenberg eds vol 2756 of Lecture Notes inComput Sci Springer Berlin 2003 pp 646ndash654
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
492 Y FENG J XIAO AND M GU
Algorithm 9 Approximate SVD with Randomized Subspace IterationInputsMatrix A isin Rmtimesn Target rank k Oversampling size p ge 0 Number of iterations q ge 1OutputsU isin Rmtimesk contains the approximate top k left singular vectors of AΣ isin Rktimesk contains the approximate top k singular values of AV isin Rntimesk contains the approximate top k right singular vectors of AAlgorithmGenerate iid Gaussian matrix Ω isin N (0 1)
ntimes(k+p)Compute B = AΩ[Qsim] = qr (B 0)for i = 1 q doB = AT lowastQ[Qsim] = qr (B 0)B = A lowastQ[Qsim] = qr (B 0)
end forB = QT lowastA[UΣ V ] = svd (B)U = Q lowast U U = U ( 1 k)Σ = Σ (1 k 1 k)V = V ( 1 k)
[23] G W FURNAS S DEERWESTER S T DUMAIS T K LANDAUER R A HARSHMAN L A STREETERAND K E LOCHBAUM Information retrieval using a singular value decomposition model of latentsemantic structure in Proceedings of the 11th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval Y Chiaramella ed ACM New York 1988 pp 465ndash480
[24] K GOLDBERG T ROEDER D GUPTA AND C PERKINS Eigentaste A constant time collaborative filteringalgorithm Inf Retrieval 4 (2001) pp 133ndash151
[25] G GOLUB Numerical methods for solving linear least squares problems Numer Math 7 (1965) pp 206ndash216[26] G H GOLUB AND C F VAN LOAN Matrix Computations 4th ed Johns Hopkins University Press
Baltimore 2013[27] M GU Subspace iteration randomization and singular value problems SIAM J Sci Comput 37 (2015)
pp A1139ndashA1173[28] M GU AND S C EISENSTAT Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM J Sci Comput 17 (1996) pp 848ndash869[29] E T HALE W YIN AND Y ZHANG Fixed-point continuation for l1-minimization methodology and
convergence SIAM J Optim 19 (2008) pp 1107ndash1130[30] N HALKO P G MARTINSSON AND J A TROPP Finding structure with randomness probabilistic
algorithms for constructing approximate matrix decompositions SIAM Rev 53 (2011) pp 217ndash288[31] R A HARSHMAN Foundations of the PARAFAC procedure Models and conditions for an ldquoexplanatoryrdquo
multimodal factor analysis UCLA Working Papers in Phonetics 16 (1970) pp 1ndash84[32] J L HERLOCKER J A KONSTAN A BORCHERS AND J RIEDL An algorithmic framework for performing
collaborative filtering in Proceedings of the 22nd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval F Gey M Heaast and R Tong eds ACM NewYork 1999 pp 230ndash237
[33] R A HORN AND C R JOHNSON Topics in Matrix Analysis Cambridge University Press Cambridge 1991[34] D A HUCKABY AND T F CHAN On the convergence of Stewartrsquos QLP algorithm for approximating the
SVD Numer Algorithms 32 (2003) pp 287ndash316[35] Stewartrsquos pivoted QLP decomposition for low-rank matrices Numer Linear Algebra Appl 12 (2005)
pp 153ndash159[36] W B JOHNSON AND J LINDENSTRAUSS Extensions of Lipschitz mappings into a Hilbert space in
Conference in Modern Analysis and Probability (New Haven Conn 1982) R Beals A Beck A Bellowand A Hajian eds vol 26 of Contemp Math Amer Math Soc Providence 1984 pp 189ndash206
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
FLIP-FLOP SPECTRUM-REVEALING QR FACTORIZATION 493
[37] I T JOLLIFFE Principal Component Analysis Springer New York 1986[38] J M KLEINBERG Authoritative sources in a hyperlinked environment J ACM 46 (1999) pp 604ndash632[39] T G KOLDA Orthogonal tensor decompositions SIAM J Matrix Anal Appl 23 (2001) pp 243ndash255[40] T G KOLDA AND B W BADER Tensor decompositions and applications SIAM Rev 51 (2009) pp 455ndash
500[41] R M LARSEN PROPACK-software for large and sparse SVD calculations available at
httpsunstanfordedurmunkPROPACK[42] Lanczos bidiagonalization with partial reorthogonalization Tech Report PB-537 Department of
Computer Science Aarhus University Aarhus Denmark 1998[43] Y LECUN L BOTTOU Y BENGIO AND P HAFFNER Gradient-based learning applied to document
recognition Proc IEEE 86 (1998) pp 2278ndash2324[44] R B LEHOUCQ D C SORENSEN AND C YANG ARPACK Usersrsquo Guide Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods SIAM Philadelphia 1998[45] Z LIN M CHEN AND Y MA The augmented Lagrange multiplier method for exact recovery of corrupted
low-rank matrices Preprint on arXiv httpsarxivorgabs10095055 2010[46] N LINIAL E LONDON AND Y RABINOVICH The geometry of graphs and some of its algorithmic
applications Combinatorica 15 (1995) pp 215ndash245[47] Z LIU A HANSSON AND L VANDENBERGHE Nuclear norm system identification with missing inputs and
outputs Systems Control Lett 62 (2013) pp 605ndash612[48] Z LIU AND L VANDENBERGHE Interior-point method for nuclear norm approximation with application to
system identification SIAM J Matrix Anal Appl 31 (2009) pp 1235ndash1256[49] S MA D GOLDFARB AND L CHEN Fixed point and Bregman iterative methods for matrix rank minimiza-
tion Math Program 128 (2011) pp 321ndash353[50] C B MELGAARD Randomized Pivoting and Spectrum-Revealing Bounds in Numerical Linear Algebra
PhD Thesis University of California Berkeley 2015[51] M MESBAHI AND G P PAPAVASSILOPOULOS On the rank minimization problem over a positive semidefinite
linear matrix inequality IEEE Trans Automat Control 42 (1997) pp 239ndash243[52] N MULLER L MAGAIA AND B M HERBST Singular value decomposition eigenfaces and 3D recon-
structions SIAM Rev 46 (2004) pp 518ndash545[53] M NARWARIA AND W LIN SVD-based quality metric for image and video using machine learning IEEE
Trans Systbdquo Man and Cybernetics Part B (Cybernetics) 42 (2012) pp 347ndash364[54] T-H OH Y MATSUSHITA Y-W TAI AND I SO KWEON Fast randomized singular value thresholding
for nuclear norm minimization in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition 2015 IEEE Conference Proceedings IEEE Los Alamitos 2015 pp 4484ndash4493
[55] G QUINTANA-ORTIacute X SUN AND C H BISCHOF A BLAS-3 version of the QR factorization with columnpivoting SIAM J Sci Comput 19 (1998) pp 1486ndash1494
[56] B RECHT M FAZEL AND P A PARRILO Guaranteed minimum-rank solutions of linear matrix equationsvia nuclear norm minimization SIAM Rev 52 (2010) pp 471ndash501
[57] J D RENNIE AND N SREBRO Fast maximum margin matrix factorization for collaborative prediction inProceedings of the 22nd International Conference on Machine Learning L De Raedt and S Wrobel edsACM Press Neow York 2005 pp 713ndash719
[58] V ROKHLIN A SZLAM AND M TYGERT A randomized algorithm for principal component analysis SIAMJ Matrix Anal Appl 31 (2009) pp 1100ndash1124
[59] A K SAIBABA HOID higher order interpolatory decomposition for tensors based on Tucker representationSIAM J Matrix Anal Appl 37 (2016) pp 1223ndash1249
[60] B SAVAS AND L ELDEacuteN Handwritten digit classification using higher order singular value decompositionPattern Recognition 40 (2007) pp 993ndash1003
[61] R SCHREIBER AND C VAN LOAN A storage-efficient WY representation for products of Householdertransformations SIAM J Sci Statist Comput 10 (1989) pp 53ndash57
[62] A SHASHUA AND T HAZAN Non-negative tensor factorization with applications to statistics and com-puter vision in Proceedings of the 22nd International Conference on Machine Learning S DzeroskiL De Raedt and S Wrobel eds ACM New York 2005 pp 792ndash799
[63] N D SIDIROPOULOS R BRO AND G B GIANNAKIS Parallel factor analysis in sensor array processingIEEE Trans Signal Process 48 (2000) pp 2377ndash2388
[64] D C SORENSEN AND M EMBREE A DEIM induced CUR factorization SIAM J Sci Comput 38 (2016)pp A1454ndashA1482
[65] N SREBRO J RENNIE AND T S JAAKKOLA Maximum-margin matrix factorization in NIPSrsquo04 Proceed-ings of the 17th International Conference on Neural Information Processing Systems L K Saul Y Weissand L Bottou eds MIT Press Cambridge 2004 pp 1329ndash1336
[66] G W STEWART The QLP approximation to the singular value decomposition SIAM J Sci Comput 20(1999) pp 1336ndash1348
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
ETNAKent State University and
Johann Radon Institute (RICAM)
494 Y FENG J XIAO AND M GU
[67] S TAHERI Q QIU AND R CHELLAPPA Structure-preserving sparse decomposition for facial expressionanalysis IEEE Trans Image Process 23 (2014) pp 3590ndash3603
[68] K-C TOH AND S YUN An accelerated proximal gradient algorithm for nuclear norm regularized linearleast squares problems Pac J Optim 6 (2010) pp 615ndash640
[69] L N TREFETHEN AND D BAU III Numerical Linear Algebra SIAM Philadelphia 1997[70] L R TUCKER Some mathematical notes on three-mode factor analysis Psychometrika 31 (1966) pp 279ndash
311[71] N VANNIEUWENHOVEN R VANDEBRIL AND K MEERBERGEN A new truncation strategy for the higher-
order singular value decomposition SIAM J Sci Comput 34 (2012) pp A1027ndashA1052[72] M A O VASILESCU AND D TERZOPOULOS Multilinear analysis of image ensembles Tensorfaces in
European Conference on Computer Vision 2002 A Heyden G Sparr M Nielsen and P Johansen edsLecture Notes in Computer Science vol 2350 Springer Berlin 2002 pp 447ndash460
[73] N VERVLIET O DEBALS L SORBER M V BAREL AND L DE LATHAUWER Tensorlab user guideavailable at httpwwwtensorlabnet
[74] J XIAO AND M GU Spectrum-revealing Cholesky factorization for kernel methods in Proceedings of the 16thIEEE International Conference on Data Mining (ICDM) IEEE Conference Proceedings Los Alamitos2016 pp 1293ndash1298
[75] J XIAO M GU AND J LANGOU Fast parallel randomized QR with column pivoting algorithms for reliablelow-rank matrix approximations in Proceedings of the 24th IEEE International Conference on HighPerformance Computing (HiPC) IEEE Conference Proceedings Los Alamitos 2017 pp 233ndash242
[76] T ZHANG AND G H GOLUB Rank-one approximation to high order tensors SIAM J Matrix Anal Appl23 (2001) pp 534ndash550
Recommended